|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object nz.ac.waikato.mcennis.rat.parser.AbstractParser nz.ac.waikato.mcennis.rat.parser.BaseHTMLParser
public class BaseHTMLParser
Class for transforming WebPage data into bag-of-words format.
Constructor Summary | |
---|---|
BaseHTMLParser()
|
Method Summary | |
---|---|
Parser |
duplicate()
Create an exact copy of this object |
ParsedObject |
get()
Return histogram object FIX: currently returns null |
void |
parse(java.io.InputStream data,
Crawler crawler,
Properties properties)
Parse a data stream while spidering for more pages |
void |
parse(java.io.InputStream data,
Properties properties)
Parse the document into bag-of-words format. |
protected void |
processLinks(java.lang.String content,
Crawler crawler)
Creates a new URL from anchor text to crawl |
void |
set(ParsedObject o)
Set the parsed object to be loaded |
Methods inherited from class nz.ac.waikato.mcennis.rat.parser.AbstractParser |
---|
check, check, getName, getParameter, getParameter, init, setName |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface nz.ac.waikato.mcennis.rat.parser.Parser |
---|
check, check, getName, getParameter, getParameter, init, setName |
Constructor Detail |
---|
public BaseHTMLParser()
Method Detail |
---|
public void parse(java.io.InputStream data, Properties properties)
parse
in interface Parser
parse
in class AbstractParser
data
- data stream to be parsedpublic void parse(java.io.InputStream data, Crawler crawler, Properties properties)
parse
in interface Parser
parse
in class AbstractParser
data
- stream to be parsedcrawler
- crawler for crawling new pagesprotected void processLinks(java.lang.String content, Crawler crawler)
content
- line containing the anchor textcrawler
- crawler to crawl the new URLspublic Parser duplicate()
Parser
duplicate
in interface Parser
duplicate
in class AbstractParser
public ParsedObject get()
get
in interface Parser
get
in class AbstractParser
public void set(ParsedObject o)
Parser
set
in interface Parser
set
in class AbstractParser
o
- object to be loaded
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |