|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectnz.ac.waikato.mcennis.rat.parser.BaseHTMLParser
public class BaseHTMLParser
Class for transforming WebPage data into bag-of-words format.
| Constructor Summary | |
|---|---|
BaseHTMLParser()
|
|
| Method Summary | |
|---|---|
Parser |
duplicate()
Create an exact copy of this object |
ParsedObject |
get()
Return histogram object FIX: currently returns null |
java.lang.String |
getName()
|
void |
parse(java.io.InputStream data)
Parse the document into bag-of-words format. |
void |
parse(java.io.InputStream data,
Crawler crawler)
Parse a data stream while spidering for more pages |
protected void |
processLinks(java.lang.String content,
Crawler crawler)
Creates a new URL from anchor text to crawl |
void |
set(ParsedObject o)
Set the parsed object to be loaded |
void |
setName(java.lang.String name)
Give this parser an id that should be globally unique |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public BaseHTMLParser()
| Method Detail |
|---|
public void parse(java.io.InputStream data)
parse in interface Parserdata - data stream to be parsed
public void parse(java.io.InputStream data,
Crawler crawler)
parse in interface Parserdata - stream to be parsedcrawler - crawler for crawling new pages
protected void processLinks(java.lang.String content,
Crawler crawler)
content - line containing the anchor textcrawler - crawler to crawl the new URLspublic Parser duplicate()
Parser
duplicate in interface Parserpublic ParsedObject get()
get in interface Parserpublic void set(ParsedObject o)
Parser
set in interface Parsero - object to be loadedpublic void setName(java.lang.String name)
Parser
setName in interface Parsername - id for this parserpublic java.lang.String getName()
getName in interface Parser
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||