|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectnz.ac.waikato.mcennis.rat.parser.AbstractParser
nz.ac.waikato.mcennis.rat.parser.BaseHTMLParser
public class BaseHTMLParser
Class for transforming WebPage data into bag-of-words format.
| Constructor Summary | |
|---|---|
BaseHTMLParser()
|
|
| Method Summary | |
|---|---|
Parser |
duplicate()
Create an exact copy of this object |
ParsedObject |
get()
Return histogram object FIX: currently returns null |
void |
parse(java.io.InputStream data,
Crawler crawler,
Properties properties)
Parse a data stream while spidering for more pages |
void |
parse(java.io.InputStream data,
Properties properties)
Parse the document into bag-of-words format. |
protected void |
processLinks(java.lang.String content,
Crawler crawler)
Creates a new URL from anchor text to crawl |
void |
set(ParsedObject o)
Set the parsed object to be loaded |
| Methods inherited from class nz.ac.waikato.mcennis.rat.parser.AbstractParser |
|---|
check, check, getName, getParameter, getParameter, init, setName |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Methods inherited from interface nz.ac.waikato.mcennis.rat.parser.Parser |
|---|
check, check, getName, getParameter, getParameter, init, setName |
| Constructor Detail |
|---|
public BaseHTMLParser()
| Method Detail |
|---|
public void parse(java.io.InputStream data,
Properties properties)
parse in interface Parserparse in class AbstractParserdata - data stream to be parsed
public void parse(java.io.InputStream data,
Crawler crawler,
Properties properties)
parse in interface Parserparse in class AbstractParserdata - stream to be parsedcrawler - crawler for crawling new pages
protected void processLinks(java.lang.String content,
Crawler crawler)
content - line containing the anchor textcrawler - crawler to crawl the new URLspublic Parser duplicate()
Parser
duplicate in interface Parserduplicate in class AbstractParserpublic ParsedObject get()
get in interface Parserget in class AbstractParserpublic void set(ParsedObject o)
Parser
set in interface Parserset in class AbstractParsero - object to be loaded
|
|
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||