|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectnz.ac.waikato.mcennis.rat.crawler.GZipFileCrawler
public class GZipFileCrawler
| Constructor Summary | |
|---|---|
GZipFileCrawler()
Creates a new instance of GZipFileCrawler |
|
| Method Summary | |
|---|---|
void |
crawl(java.lang.String site)
Identical to crawl except all parsers are used |
void |
crawl(java.lang.String site,
java.lang.String[] parsers)
|
Parser[] |
getParser()
Returns an array of Parser objects used by this crawler to parse pages. |
boolean |
isCaching()
Is the crawler caching the page or is it re-acquiring the page for each parser. |
boolean |
isSpidering()
Is this crawler following links |
void |
set(Parser[] parser)
Set the parsers that are to be utilized by this crawler to interpret the documents that are parsed. |
void |
setCaching(boolean b)
Set whether or notthe crawler should cache the web page or reload for each individual parser. |
void |
setProxy(boolean proxy)
Establishes whether a proxy is needed to access documents |
void |
setSpidering(boolean s)
Should links to new documents discovered also be read |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public GZipFileCrawler()
| Method Detail |
|---|
public void crawl(java.lang.String site)
throws java.net.MalformedURLException,
java.io.IOException
crawl in interface Crawlersite - site to be crawled
java.net.MalformedURLException - id site URL is invalid
java.io.IOException - error occurs during retrieval
public void crawl(java.lang.String site,
java.lang.String[] parsers)
throws java.net.MalformedURLException,
java.io.IOException
crawl in interface Crawlersite - Name of the document to be fetchedparsers - index of parsers to parse this site
java.net.MalformedURLException - If the site to crawl is not a valid document. Only thrown if the
underlying crawler is retrieving documents via a http or similar
protocol.
java.io.IOException - Thrown if their is a problem retrieving the document to be processed.public void set(Parser[] parser)
Crawler
set in interface Crawlerparser - Array of parsing objects to be utilized by the crawler to process documents fetchedpublic Parser[] getParser()
Crawler
getParser in interface Crawlerpublic void setProxy(boolean proxy)
Crawler
setProxy in interface Crawlerproxy - Whether or not a proxy is needed for accessing documentspublic void setCaching(boolean b)
Crawler
setCaching in interface Crawlerb - should caching occurpublic boolean isCaching()
Crawler
isCaching in interface Crawlerpublic void setSpidering(boolean s)
Crawler
setSpidering in interface Crawlerpublic boolean isSpidering()
Crawler
isSpidering in interface Crawler
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||