|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object nz.ac.waikato.mcennis.rat.parser.BaseHTMLParser
public class BaseHTMLParser
Class for transforming WebPage data into bag-of-words format.
Constructor Summary | |
---|---|
BaseHTMLParser()
|
Method Summary | |
---|---|
Parser |
duplicate()
Create an exact copy of this object |
ParsedObject |
get()
Return histogram object FIX: currently returns null |
java.lang.String |
getName()
|
void |
parse(java.io.InputStream data)
Parse the document into bag-of-words format. |
void |
parse(java.io.InputStream data,
Crawler crawler)
Parse a data stream while spidering for more pages |
protected void |
processLinks(java.lang.String content,
Crawler crawler)
Creates a new URL from anchor text to crawl |
void |
set(ParsedObject o)
Set the parsed object to be loaded |
void |
setName(java.lang.String name)
Give this parser an id that should be globally unique |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public BaseHTMLParser()
Method Detail |
---|
public void parse(java.io.InputStream data)
parse
in interface Parser
data
- data stream to be parsedpublic void parse(java.io.InputStream data, Crawler crawler)
parse
in interface Parser
data
- stream to be parsedcrawler
- crawler for crawling new pagesprotected void processLinks(java.lang.String content, Crawler crawler)
content
- line containing the anchor textcrawler
- crawler to crawl the new URLspublic Parser duplicate()
Parser
duplicate
in interface Parser
public ParsedObject get()
get
in interface Parser
public void set(ParsedObject o)
Parser
set
in interface Parser
o
- object to be loadedpublic void setName(java.lang.String name)
Parser
setName
in interface Parser
name
- id for this parserpublic java.lang.String getName()
getName
in interface Parser
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |