|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectnz.ac.waikato.mcennis.rat.crawler.FileListCrawler
public class FileListCrawler
Crawler designed to parse files from the local file system. Utilizes native file access mechanisms.
| Constructor Summary | |
|---|---|
FileListCrawler()
|
|
| Method Summary | |
|---|---|
void |
block(java.lang.String site)
Pass the given URL (as a string) to the filter object via the CrawlerFilter.load(String site) method |
void |
block(java.lang.String site,
Properties props)
Pass the given URL (as a string) to the filter object via the CrawlerFilter.load(String site, Properties props) method |
void |
crawl(java.lang.String site)
Identical to crawl except all parsers are used |
void |
crawl(java.lang.String site,
Properties parsers)
Retrieves the given document from the local filesystem TODO: Finish parameters |
Crawler |
getCrawler()
return a reference to the crawler used when spidering is enabled. |
CrawlerFilter |
getFilter()
Returns the current CrawlerFilter set for this crawler. |
Parser[] |
getParsers()
Returns an array of Parser objects used by this crawler to parse pages. |
Properties |
getProperties()
Returns the parsers that are associated with this crawler. |
java.lang.String |
getProxyHost()
Return the string to be used to determine the proxy host for this connection |
java.lang.String |
getProxyPort()
Returns the string describing the port the proxy is operating on |
java.lang.String |
getProxyType()
Returns the string the crawler will use to set the system property determining the proxy type. |
boolean |
isCaching()
Is the crawler caching the page or is it re-acquiring the page for each parser. |
boolean |
isSpidering()
Is this crawler following links. |
void |
set(Properties parser)
Takes the current array of parsers and creates a copy of them utilizing the duplicate method on each parser. |
void |
setCaching(boolean b)
Sets caching on or off |
void |
setCrawler(Crawler c)
Sets the crawler used when spidering. |
void |
setFilter(CrawlerFilter filter)
Sets the filter used to determine whether or not a given URL should be added to the list of URLs to be crawled. |
void |
setParsers(Parser[] parsers)
Sets the parsers available to this crawler. |
void |
setProxy(boolean proxy)
Set or unset whether the crawler should be going through a proxy to access the internet. |
void |
setProxyHost(java.lang.String proxyHost)
Sets the string to be used to determine the proxy host for this connection |
void |
setProxyPort(java.lang.String proxyPort)
Sets the string describing the port the proxy is operating on |
void |
setProxyType(java.lang.String proxyType)
Sets the string the crawler will use to set the system property determining the proxy type. |
void |
setSpidering(boolean s)
Should links to new documents discovered also be read This also sets the value of the 'Spidering' parameter. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public FileListCrawler()
| Method Detail |
|---|
public void crawl(java.lang.String site)
crawl in interface Crawlersite - site to be crawled
java.net.MalformedURLException - id site URL is invalid
java.io.IOException - error occurs during retrieval
public void crawl(java.lang.String site,
Properties parsers)
crawl in interface Crawlersite - absolute file name of the file on the local file systemnz.ac.waikato.mcennis.arm.crawler.Crawler#crawl(java.lang.String)public java.lang.String getProxyHost()
public void setProxyHost(java.lang.String proxyHost)
proxyHost - proxy host descriptorpublic void set(Properties parser)
set in interface Crawlerparser - Array of parsers to be duplicated and set for parsing
documentspublic java.lang.String getProxyPort()
public void setProxyPort(java.lang.String proxyPort)
proxyPort - port of the proxypublic java.lang.String getProxyType()
public void setProxyType(java.lang.String proxyType)
proxyType - type of the proxypublic Crawler getCrawler()
Crawler
getCrawler in interface Crawlerpublic void setCrawler(Crawler c)
setCrawler in interface Crawlerc - crawle to use for spideringpublic Properties getProperties()
getProperties in interface Crawlerpublic void setProxy(boolean proxy)
setProxy in interface Crawlerproxy - Should a proxy be used to access the internetpublic void setCaching(boolean b)
setCaching in interface Crawlerb - should caching be enabledpublic boolean isCaching()
Crawler
isCaching in interface Crawlerpublic void setSpidering(boolean s)
Crawler
setSpidering in interface Crawlerpublic boolean isSpidering()
Crawler
isSpidering in interface Crawlerpublic void setFilter(CrawlerFilter filter)
Crawler
setFilter in interface Crawlerfilter - Function to determine whether a URL should be parsed or not.public CrawlerFilter getFilter()
Crawler
getFilter in interface Crawlerpublic Parser[] getParsers()
Crawler
getParsers in interface Crawlerpublic void setParsers(Parser[] parsers)
Crawler
setParsers in interface Crawlerparsers - Parsers that can be called in a given crawler.public void block(java.lang.String site)
Crawler
block in interface Crawlersite - URL to pass to the filter wihtout passing to the parsers.nz.ac.waikato.mcennis.rat.crawler.filter.Crawler.lod(String site)
public void block(java.lang.String site,
Properties props)
Crawler
block in interface Crawlersite - URL to pass to the filter wihtout passing to the parsers.props - Properties object defining parameters associated with parsing the site.nz.ac.waikato.mcennis.rat.crawler.filter.Crawler.lod(String site, Properties props)
|
|
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||