nz.ac.waikato.mcennis.rat.dataAquisition
Class CrawlLiveJournal
java.lang.Object
nz.ac.waikato.mcennis.rat.graph.model.ModelShell
nz.ac.waikato.mcennis.rat.dataAquisition.CrawlLiveJournal
- All Implemented Interfaces:
- Component, DataAquisition, Model
public class CrawlLiveJournal
- extends ModelShell
- implements DataAquisition
This class enables parsing and spidering of the LiveJournal site.
Parameters:
- Parameter AlgorithmClass:
- Key for this data aquisition algorithms. Typically not changed.
- String Object
- max: 1
- min: 1
- Typically not altered
- Default: Crawl LiveJournal
- Parameter Name:
- Globally unique name for this data aquisition algorithm.
- String Object
- max: 1
- min: 1
- Default: Crawl LiveJournal
- Parameter Category:
- Category of data aquisition algorithm
- String Object
- max: 1
- min: 1
- Typically not altered
- Default: Web Crawler
- Parameter CrawlerClass:
- Class of web crawler data aquisition algorithm.
- String Object
- max: 1
- min: 1
- Typically not altered
- Default: Web Crawler
- Parameter URLPrefix:
- Prefix to use when constructing a Livejournal URL from a username.
- String Object
- max: 1
- min: 1
- Default: http://
- Parameter URLSuffix:
- Suffix to use when constructing a LiveJournal URL from a username.
- String Object
- max: 1
- min: 1
- Default: .livejournal.com/data/foaf
- Parameter DownloadDirectory:
- File where the FOAFs are to be stored
- File Object
- max: 1
- min: 1
- Must be a valid directory
- No default provided
- Parameter UseProxy:
- Is a proxy required for Internet access?
- Boolean Object
- max: 1
- min: 1
- Default: false
- Parameter ProxyUser:
- If a proxy username is needed, what username should be used
- String Object
- max: 0
- min: 1
- Default: dm75
- Parameter ProxyPassword:
- If a proxy password is needed, what should be used. Note that this is not encrypted if parameters are serialized.
- String Object
- max: 0
- min: 1
- No Default
- Parameter ProxyLocation:
- If needed, where is the proxy?
- String Object
- max: 0
- min: 1
- Default: proxy.cs.waikato.ac.nz
- Parameter ProxyType:
- If needed, what protocol (by integer) is this proxy using?
- String Object
- max: 1
- min: 1
- Default: 4
- Parameter ProxyPort:
- If needed, what port is this proxy running on?
- String Object
- max: 0
- min: 1
- Default: 80
- Parameter CrawlerFilter:
- Filter for deciding whether or not to retrieve a particular page.
- CrawlerFilter Object
- max: 1
- min: 1
- Default: And{SiteBlocking,MaxDepth(3),StopCount(10000)}
Constructor Summary |
CrawlLiveJournal()
Creates a new instance of CrawlLiveJournal |
Method Summary |
void |
cancel()
Cancel the run at the end of the next user. |
Graph |
get()
Obtain a reference to the graph this object holds |
java.util.List<IODescriptor> |
getInputType()
The input type describes all the different kinds of graph objects that
are utilized (and hence required) by this object. |
java.util.List<IODescriptor> |
getOutputType()
The output type describes all the different kinds of graph objects that
are created during the execution of this algorithm. |
Properties |
getParameter()
List of all parameters this component accepts. |
Parameter |
getParameter(java.lang.String param)
Returns the specific parameter identified by its key-name. |
void |
init(Properties map)
Initializes this object. |
CrawlLiveJournal |
prototype()
All Components implement the prototype pattern. |
void |
set(Graph g)
Set the graph to be populated by this object |
void |
start()
Creates the crawler and proxies and starts spidering FOAF descriptions. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface nz.ac.waikato.mcennis.rat.graph.model.Model |
addListener |
CrawlLiveJournal
public CrawlLiveJournal()
- Creates a new instance of CrawlLiveJournal
start
public void start()
- Creates the crawler and proxies and starts spidering FOAF descriptions.
- Specified by:
start
in interface DataAquisition
cancel
public void cancel()
- Cancel the run at the end of the next user.
FIXME: currently unimplemented
- Specified by:
cancel
in interface DataAquisition
getInputType
public java.util.List<IODescriptor> getInputType()
- Description copied from interface:
Component
- The input type describes all the different kinds of graph objects that
are utilized (and hence required) by this object. This result is only
guaranteed to be fixed if structural parameters are not modified. This
is an empty array if there is no input.
- Specified by:
getInputType
in interface Component
- Returns:
- IODescriptor array for this component
- See Also:
IODescriptor
getOutputType
public java.util.List<IODescriptor> getOutputType()
- Description copied from interface:
Component
- The output type describes all the different kinds of graph objects that
are created during the execution of this algorithm. The result is only
guaranteed to be fixed if structural parameters are not modified. This is
an empty array if there is no output.
- Specified by:
getOutputType
in interface Component
- Returns:
- IODescriptor array for this component
- See Also:
IODescriptor
getParameter
public Properties getParameter()
- Description copied from interface:
Component
- List of all parameters this component accepts. Each parameter also has a
distinct key-name used when initializing the object using the init method.
If there are no parameters, null is returned.
- Specified by:
getParameter
in interface Component
- Returns:
- read-only array of Parameters
getParameter
public Parameter getParameter(java.lang.String param)
- Description copied from interface:
Component
- Returns the specific parameter identified by its key-name. If no
parameter is found with this key-name, null is returned.
- Specified by:
getParameter
in interface Component
- Parameters:
param
- key-name of the parameter
- Returns:
- named parameter
init
public void init(Properties map)
- Initializes this object.
input/Output:
- input: None
- Output:
- User
- Type: Actor
- Description: LiveJournal user
- Knows
- Type: Link
- Description: Asynchronous relationship between users
- foaf:title
- Type: Actor Property
- Mode: User
- Description: String of Salutation of the user
- foaf:phone
- Type: Actor Property
- Mode: User
- Description: String of phone number of the user
- foaf:gender
- Type: Actor Property
- Mode: User
- Description: String of sex of the user
- ya:country
- Type: Actor Property
- Mode: User
- Description: String of country of the user
- ya:city
- Type: Actor Property
- Mode: User
- Description: String of the city of the user
- foaf:dateOfBirth
- Type: Actor Property
- Mode: User
- Description: String of birthday of the user
- foaf:msnChatID
- Type: Actor Property
- Mode: User
- Description: String of MSN online chat ID of the user
- foaf:aimChatID
- Type: Actor Property
- Mode: User
- Description: String of AIM online chat ID of the user
- ya:bio
- Type: Actor Property
- Mode: User
- Description: String of self-published description of the user
- interest
- Type: Actor Property
- Mode: User
- Description: String of a free-form tag declaring a hobby or an interest in a topic
- Specified by:
init
in interface Component
- Parameters:
map
- properties to load - null is permitted.
set
public void set(Graph g)
- Description copied from interface:
DataAquisition
- Set the graph to be populated by this object
- Specified by:
set
in interface DataAquisition
- Parameters:
g
- graph to be created for analysis
get
public Graph get()
- Description copied from interface:
DataAquisition
- Obtain a reference to the graph this object holds
- Specified by:
get
in interface DataAquisition
- Returns:
- graph created by this object
prototype
public CrawlLiveJournal prototype()
- Description copied from interface:
Component
- All Components implement the prototype pattern. The new parameter has no common resources
to the original that are not static resources o the class.
- Specified by:
prototype
in interface Component
- Returns:
- default-parameter version of the same class as the original.