|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
public interface Crawler
Interface for accessing files via io. Designed to abstract away the difference between file and web access. Utilizes parsing objects to parse the documents collected. Multiple parser can be utilized. It is crawler dependant whether all parsers are used against all documents or only a subset.
Method Summary | |
---|---|
void |
crawl(java.lang.String site)
fetches the site designated by site. |
void |
crawl(java.lang.String site,
java.lang.String[] parsers)
|
Parser[] |
getParser()
Returns an array of Parser objects used by this crawler to parse pages. |
boolean |
isCaching()
Is the crawler caching the page or is it re-acquiring the page for each parser. |
boolean |
isSpidering()
Is this crawler following links |
void |
set(Parser[] parser)
Set the parsers that are to be utilized by this crawler to interpret the documents that are parsed. |
void |
setCaching(boolean b)
Set whether or notthe crawler should cache the web page or reload for each individual parser. |
void |
setProxy(boolean proxy)
Establishes whether a proxy is needed to access documents |
void |
setSpidering(boolean spider)
Should links to new documents discovered also be read |
Method Detail |
---|
void crawl(java.lang.String site) throws java.net.MalformedURLException, java.io.IOException
site
- Name of the document to be fetched
java.net.MalformedURLException
- If the site to crawl is not a valid document. Only thrown if the
underlying crawler is retrieving documents via a http or similar
protocol.
java.io.IOException
- Thrown if their is a problem retrieving the document to be processed.void crawl(java.lang.String site, java.lang.String[] parsers) throws java.net.MalformedURLException, java.io.IOException
site
- Name of the document to be fetchedparsers
- index of parsers to parse this site
java.net.MalformedURLException
- If the site to crawl is not a valid document. Only thrown if the
underlying crawler is retrieving documents via a http or similar
protocol.
java.io.IOException
- Thrown if their is a problem retrieving the document to be processed.void set(Parser[] parser)
parser
- Array of parsing objects to be utilized by the crawler to process documents fetchedParser[] getParser()
void setProxy(boolean proxy)
proxy
- Whether or not a proxy is needed for accessing documentsvoid setCaching(boolean b)
b
- should caching occurboolean isCaching()
void setSpidering(boolean spider)
spider
- boolean isSpidering()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |