GZipFileCrawler

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

nz.ac.waikato.mcennis.rat.crawler
Class GZipFileCrawler

java.lang.Object
  nz.ac.waikato.mcennis.rat.crawler.GZipFileCrawler

All Implemented Interfaces:: Crawler

public class GZipFileCrawler
extends java.lang.Object
implements Crawler
extends java.lang.Object
implements Crawler

Constructor Summary
`GZipFileCrawler()` Creates a new instance of GZipFileCrawler

Method Summary
`void`	`crawl(java.lang.String site)` Identical to crawl except all parsers are used
`void`	`crawl(java.lang.String site, java.lang.String[] parsers)`
`Parser[]`	`getParser()` Returns an array of Parser objects used by this crawler to parse pages.
`boolean`	`isCaching()` Is the crawler caching the page or is it re-acquiring the page for each parser.
`boolean`	`isSpidering()` Is this crawler following links
`void`	`set(Parser[] parser)` Set the parsers that are to be utilized by this crawler to interpret the documents that are parsed.
`void`	`setCaching(boolean b)` Set whether or notthe crawler should cache the web page or reload for each individual parser.
`void`	`setProxy(boolean proxy)` Establishes whether a proxy is needed to access documents
`void`	`setSpidering(boolean s)` Should links to new documents discovered also be read

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

GZipFileCrawler

public GZipFileCrawler()

Creates a new instance of GZipFileCrawler

Method Detail

crawl

public void crawl(java.lang.String site)
           throws java.net.MalformedURLException,
                  java.io.IOException

Identical to crawl except all parsers are used

Specified by:: crawl in interface Crawler

Parameters:: site - site to be crawled
Throws:: java.net.MalformedURLException - id site URL is invalid; java.io.IOException - error occurs during retrieval

crawl

public void crawl(java.lang.String site,
                  java.lang.String[] parsers)
           throws java.net.MalformedURLException,
                  java.io.IOException

Specified by:: crawl in interface Crawler

Parameters:: site - Name of the document to be fetched; parsers - index of parsers to parse this site
Throws:: java.net.MalformedURLException - If the site to crawl is not a valid document. Only thrown if the underlying crawler is retrieving documents via a http or similar protocol.; java.io.IOException - Thrown if their is a problem retrieving the document to be processed.