nz.ac.waikato.mcennis.rat.crawler.filter
Class BlockPreviousSite

java.lang.Object
  extended by nz.ac.waikato.mcennis.rat.crawler.filter.BlockPreviousSite
All Implemented Interfaces:
CrawlerFilter

public class BlockPreviousSite
extends java.lang.Object
implements CrawlerFilter

refuse to parse all previously seen sites by their URL alone


Constructor Summary
BlockPreviousSite()
           
 
Method Summary
 void add(java.lang.String site)
          synonym for load(String site)
 void build(java.util.HashSet list, boolean not)
          Builds a new filter.
 boolean check(java.lang.String site)
          Should the URL this string represents be retrieved
 boolean check(java.lang.String site, Properties parameters)
          Should the URL this string represents be retrieved, given the parameters provided
 void load(java.lang.String site)
          Submit the given site to the filter chain without retrieving it.
 void load(java.lang.String site, Properties parameters)
          Submit the given site - parameter combination to the filter chain without retrieving it.
 BlockPreviousSite prototype()
          Creates a new default version of this class with no common data excepting static variables
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BlockPreviousSite

public BlockPreviousSite()
Method Detail

check

public boolean check(java.lang.String site)
Description copied from interface: CrawlerFilter
Should the URL this string represents be retrieved

Specified by:
check in interface CrawlerFilter
Parameters:
site - URL of the site to be retrieved
Returns:
retrieve or not retrieve

check

public boolean check(java.lang.String site,
                     Properties parameters)
Description copied from interface: CrawlerFilter
Should the URL this string represents be retrieved, given the parameters provided

Specified by:
check in interface CrawlerFilter
Parameters:
site - URL to be retrieved
parameters - parameters governing the retrieval
Returns:
retrieve or not retrieve

build

public void build(java.util.HashSet list,
                  boolean not)
Builds a new filter. If the list is non-null, the original HashSet is kept, else the provided list replaces the old. if 'not', only sites listed in the list get parsed.

Parameters:
list - HashSet containing list of sites to skip
not - should blocked sites be skipped or new sites skipped

add

public void add(java.lang.String site)
synonym for load(String site)

Parameters:
site - site to be skipped

load

public void load(java.lang.String site)
Description copied from interface: CrawlerFilter
Submit the given site to the filter chain without retrieving it.

Specified by:
load in interface CrawlerFilter
Parameters:
site - URL to be added

load

public void load(java.lang.String site,
                 Properties parameters)
Description copied from interface: CrawlerFilter
Submit the given site - parameter combination to the filter chain without retrieving it.

Specified by:
load in interface CrawlerFilter
Parameters:
site - URL to be added

prototype

public BlockPreviousSite prototype()
Description copied from interface: CrawlerFilter
Creates a new default version of this class with no common data excepting static variables

Specified by:
prototype in interface CrawlerFilter
Returns:
new filter of the same class as the parent

Get Relational Analysis Toolkit at SourceForge.net. Fast, secure and Free Open Source software downloads