nz.ac.waikato.mcennis.rat.crawler
Class WebCrawler.Spider

java.lang.Object
  extended by nz.ac.waikato.mcennis.rat.crawler.CrawlerBase
      extended by nz.ac.waikato.mcennis.rat.crawler.WebCrawler.Spider
All Implemented Interfaces:
java.lang.Runnable, Crawler
Enclosing class:
WebCrawler

public class WebCrawler.Spider
extends CrawlerBase
implements java.lang.Runnable

Helper class that is used for threads. Crawls sites in order. Each crawler recieves an equal number f sites to crawl, not based on curent load.


Field Summary
 
Fields inherited from class nz.ac.waikato.mcennis.rat.crawler.CrawlerBase
cache, parser, proxy, spider
 
Constructor Summary
WebCrawler.Spider(WebCrawler p)
          Base constructor that stores a reference to the parent in each thread
 
Method Summary
protected  void add(WebCrawler.SiteReference site)
          add a new site to be crawled by this thread
protected  void doParse(byte[] raw_data, java.lang.String[] parsers)
          Helper function separated from public parse to allow easy overloading.
protected  boolean isEmpty()
          is this thread idle and waiting for more sites to crawl
 boolean isRunning()
           
 void run()
          starts the thread executing, parsing web sites in its queue until it recieves a stop request.
 
Methods inherited from class nz.ac.waikato.mcennis.rat.crawler.CrawlerBase
crawl, crawl, getParser, getProxyHost, getProxyPort, getProxyType, isCaching, isSpidering, set, setCaching, setProxy, setProxyHost, setProxyPort, setProxyType, setSpidering
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

WebCrawler.Spider

public WebCrawler.Spider(WebCrawler p)
Base constructor that stores a reference to the parent in each thread

Parameters:
p - parent which allows communication between thread and the parent object.
Method Detail

add

protected void add(WebCrawler.SiteReference site)
add a new site to be crawled by this thread

Parameters:
entry - Site to be crawled

isEmpty

protected boolean isEmpty()
is this thread idle and waiting for more sites to crawl


run

public void run()
starts the thread executing, parsing web sites in its queue until it recieves a stop request.

Specified by:
run in interface java.lang.Runnable

isRunning

public boolean isRunning()

doParse

protected void doParse(byte[] raw_data,
                       java.lang.String[] parsers)
                throws java.io.IOException,
                       java.lang.Exception
Description copied from class: CrawlerBase
Helper function separated from public parse to allow easy overloading. Parses the given data into a byte array and passes a copy to every parser.

Overrides:
doParse in class CrawlerBase
Throws:
java.io.IOException
java.lang.Exception