nz.ac.waikato.mcennis.rat.crawler
Class FileListCrawler

java.lang.Object
  extended by nz.ac.waikato.mcennis.rat.crawler.FileListCrawler
All Implemented Interfaces:
Crawler

public class FileListCrawler
extends java.lang.Object
implements Crawler

Crawler designed to parse files from the local file system. Utilizes native file access mechanisms.


Field Summary
protected  boolean cache
          Perform caching so each parser gets cached copy of original.
 
Constructor Summary
FileListCrawler()
           
 
Method Summary
 void crawl(java.lang.String site)
          Identical to crawl except all parsers are used
 void crawl(java.lang.String site, java.lang.String[] parsers)
          Retrieves the given document from the local filesystem
 Crawler getCrawler()
           
 Parser[] getParser()
          Retrieves the parsers that are associated with this crawler
 boolean isCaching()
          Is the crawler caching the page or is it re-acquiring the page for each parser.
 boolean isSpidering()
          Is this crawler following links
 void set(Parser[] parser)
          Establishes the parsers to be used by this crawler when retrieving parsers.
 void setCaching(boolean b)
          Set whether or notthe crawler should cache the web page or reload for each individual parser.
 void setCrawler(Crawler c)
           
 void setProxy(boolean proxy)
          This has no effect on this crawler - there is no proxy for files
 void setSpidering(boolean s)
          Should links to new documents discovered also be read
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

cache

protected boolean cache
Perform caching so each parser gets cached copy of original. FIXME: always performs caching regardless of caching value

Constructor Detail

FileListCrawler

public FileListCrawler()
Method Detail

crawl

public void crawl(java.lang.String site)
Identical to crawl except all parsers are used

Specified by:
crawl in interface Crawler
Parameters:
site - site to be crawled
Throws:
java.net.MalformedURLException - id site URL is invalid
java.io.IOException - error occurs during retrieval

crawl

public void crawl(java.lang.String site,
                  java.lang.String[] parsers)
Retrieves the given document from the local filesystem

Specified by:
crawl in interface Crawler
Parameters:
site - absolute file name of the file on the local file system
parsers - index of parsers to parse this site
See Also:
nz.ac.waikato.mcennis.arm.crawler.Crawler#crawl(java.lang.String)

set

public void set(Parser[] parser)
Establishes the parsers to be used by this crawler when retrieving parsers.

Specified by:
set in interface Crawler
Parameters:
parser - Array of parsing objects to be utilized by the crawler to process documents fetched
See Also:
nz.ac.waikato.mcennis.arm.crawler.Crawler#set(nz.ac.waikato.mcennis.arm.parser.Parser[])

getParser

public Parser[] getParser()
Retrieves the parsers that are associated with this crawler

Specified by:
getParser in interface Crawler
Returns:
returns an array of Parsers utilized to parse files

setProxy

public void setProxy(boolean proxy)
This has no effect on this crawler - there is no proxy for files

Specified by:
setProxy in interface Crawler
Parameters:
proxy - not utilized or read

setCaching

public void setCaching(boolean b)
Description copied from interface: Crawler
Set whether or notthe crawler should cache the web page or reload for each individual parser. This is a tradeof between memory needed for loading potentially large files and the cost of continually reloading the web page

Specified by:
setCaching in interface Crawler
Parameters:
b - should caching occur

isCaching

public boolean isCaching()
Description copied from interface: Crawler
Is the crawler caching the page or is it re-acquiring the page for each parser.

Specified by:
isCaching in interface Crawler
Returns:
is caching enabled

setSpidering

public void setSpidering(boolean s)
Description copied from interface: Crawler
Should links to new documents discovered also be read

Specified by:
setSpidering in interface Crawler

isSpidering

public boolean isSpidering()
Description copied from interface: Crawler
Is this crawler following links

Specified by:
isSpidering in interface Crawler
Returns:
follows links or not

getCrawler

public Crawler getCrawler()

setCrawler

public void setCrawler(Crawler c)