|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectnz.ac.waikato.mcennis.rat.crawler.FileListCrawler
public class FileListCrawler
Crawler designed to parse files from the local file system. Utilizes native file access mechanisms.
| Field Summary | |
|---|---|
protected boolean |
cache
Perform caching so each parser gets cached copy of original. |
| Constructor Summary | |
|---|---|
FileListCrawler()
|
|
| Method Summary | |
|---|---|
void |
crawl(java.lang.String site)
Identical to crawl except all parsers are used |
void |
crawl(java.lang.String site,
java.lang.String[] parsers)
Retrieves the given document from the local filesystem |
Crawler |
getCrawler()
|
Parser[] |
getParser()
Retrieves the parsers that are associated with this crawler |
boolean |
isCaching()
Is the crawler caching the page or is it re-acquiring the page for each parser. |
boolean |
isSpidering()
Is this crawler following links |
void |
set(Parser[] parser)
Establishes the parsers to be used by this crawler when retrieving parsers. |
void |
setCaching(boolean b)
Set whether or notthe crawler should cache the web page or reload for each individual parser. |
void |
setCrawler(Crawler c)
|
void |
setProxy(boolean proxy)
This has no effect on this crawler - there is no proxy for files |
void |
setSpidering(boolean s)
Should links to new documents discovered also be read |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
protected boolean cache
| Constructor Detail |
|---|
public FileListCrawler()
| Method Detail |
|---|
public void crawl(java.lang.String site)
crawl in interface Crawlersite - site to be crawled
java.net.MalformedURLException - id site URL is invalid
java.io.IOException - error occurs during retrieval
public void crawl(java.lang.String site,
java.lang.String[] parsers)
crawl in interface Crawlersite - absolute file name of the file on the local file systemparsers - index of parsers to parse this sitenz.ac.waikato.mcennis.arm.crawler.Crawler#crawl(java.lang.String)public void set(Parser[] parser)
set in interface Crawlerparser - Array of parsing objects to be utilized by the crawler to process documents fetchednz.ac.waikato.mcennis.arm.crawler.Crawler#set(nz.ac.waikato.mcennis.arm.parser.Parser[])public Parser[] getParser()
getParser in interface Crawlerpublic void setProxy(boolean proxy)
setProxy in interface Crawlerproxy - not utilized or readpublic void setCaching(boolean b)
Crawler
setCaching in interface Crawlerb - should caching occurpublic boolean isCaching()
Crawler
isCaching in interface Crawlerpublic void setSpidering(boolean s)
Crawler
setSpidering in interface Crawlerpublic boolean isSpidering()
Crawler
isSpidering in interface Crawlerpublic Crawler getCrawler()
public void setCrawler(Crawler c)
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||