nz.ac.waikato.mcennis.rat.dataAquisition
Class CrawlLastFM
java.lang.Object
nz.ac.waikato.mcennis.rat.graph.model.ModelShell
nz.ac.waikato.mcennis.rat.dataAquisition.CrawlLastFM
- All Implemented Interfaces:
- Component, DataAquisition, Model
Deprecated.
public class CrawlLastFM
- extends ModelShell
- implements DataAquisition
Class for parsing the LastFM web services with a multi-threaded parser.
NOTE this class is deprecated as LastFM has discontinued this protocol.
Files are stored to the directory
structure as follows:
<directory>/<UserName>/Profile.xml
<directory>/<UserName>/Tags.xml
<directory>/<UserName>/Friends.xml
<directory>/<UserName>/Neighbours.xml
<directory>/<UserName>/topArtists.xml
<directory>/ArtistDirectory/<ArtistName>.xml
<directory>/<UserName>/<ArtistName>.xml
Profile.xml contains demographic information about a user.
Tags.xml contains the set of the top 50 tags this user has used
Friends.xml contains all users this user has declared as a friend
Neighbors.xml contains the top 50 users suffeciently similar to this user
topArtists.xml contains the top 50 artists this user has listened to by playcount
<ArtistName>.xml are the tags used to describe this artist. In a user
directory, these are the tags applied to this artist by that user only.
Parameters for this algorithm:
- Parameter AlgorithmClass:
- Human readable name for this kind of Data Aquisition algorithm. " +
"Used as the key for creating a new DataAquisition object with the DataAquisitionFactory.
- String Object
- max: 1
- min: 1
- Typically not altered
- Default: Crawl LastFM
- Parameter Name:
- Globally unique name for this instance of the Crawl LastFM data aquisition algorithm.
- String Object
- max: 1
- min: 1
- Default: Crawl LastFM
- Parameter Category:
- Describes the type of data aquisition algorithm. Used as the key for creating a new DataAquisition object with the DataAquisitionFactory.
- String Object
- max: 1
- min: 1
- Typically not altered
- Default: Web Crawler
- Parameter DownloadDirectory:
- Gives the root directory where documents should be downloaded to
- File Object
- max: 1
- min: 1
- Must be a valid directory
- No default provided
- Parameter UsernameFile:
- Location of the seed list of LastFM usernames
- File Object
- max: 1
- min: 1
- Must be a valid file
- No default provided
- Parameter StopCount:
- Total number of users to parse before terminating
- Integer Object
- max: 1
- min: 1
- must be equal to or greater than 0
- Default: 10000
- Parameter ThreadDelay:
- Delay between retrieving web pages
- Long Object
- max: 1
- min: 1
- must be equal to or greater than 0
- Default: 0
- Parameter UseProxy:
- Does Internet access use a proxy server?
- Boolean Object
- max: 1
- min: 1
- Default: false
- Parameter ProxyUser:
- What username should be used if the proxy requires a username
- String Object
- max: 0
- min: 1
- Default: dm75
- Parameter ProxyPassword:
- What password should be used if the proxy requires a password. This is not encrypted if parameters are serialized.
- String Object
- max: 0
- min: 1
- No default provided
- Parameter ProxyLocation:
- What internet address is the Internet proxy located at
- String Object
- max: 0
- min: 1
- proxy.cs.waikato.ac.nz
- Parameter ProxyType:
- What protocol (by Integer) is this proxy using.
- Integer Object
- max: 0
- min: 1
- Default: 4
- Parameter ProxyPort:
- Which port is the proxy expecting Internet connection requests
- Integer Object
- max: 0
- min: 1
- Must not be empty
- Default: 80
Constructor Summary |
CrawlLastFM()
Deprecated. Constructor for Crawl LastFM. |
Method Summary |
void |
cancel()
Deprecated. Stop all collection at the end of the next entity (file, web-page, etc.) |
Graph |
get()
Deprecated. Obtain a reference to the graph this object holds |
java.util.List<IODescriptor> |
getInputType()
Deprecated. The input type describes all the different kinds of graph objects that
are utilized (and hence required) by this object. |
java.util.List<IODescriptor> |
getOutputType()
Deprecated. The output type describes all the different kinds of graph objects that
are created during the execution of this algorithm. |
Properties |
getParameter()
Deprecated. List of all parameters this component accepts. |
Parameter |
getParameter(java.lang.String param)
Deprecated. Returns the specific parameter identified by its key-name. |
void |
init(Properties map)
Deprecated. Initializes the data aqusition algorithm. |
protected boolean |
isArtistParsed(java.io.File userDirectory)
Deprecated. |
protected void |
parseArtists(java.io.File artistDirectory)
Deprecated. |
CrawlLastFM |
prototype()
Deprecated. All Components implement the prototype pattern. |
void |
set(Graph g)
Deprecated. Set the graph to be populated by this object |
void |
start()
Deprecated. Begin executing the data aquisition module |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface nz.ac.waikato.mcennis.rat.graph.model.Model |
addListener |
CrawlLastFM
public CrawlLastFM()
- Deprecated.
- Constructor for Crawl LastFM. Has the following parameters:
start
public void start()
- Deprecated.
- Description copied from interface:
DataAquisition
- Begin executing the data aquisition module
- Specified by:
start
in interface DataAquisition
isArtistParsed
protected boolean isArtistParsed(java.io.File userDirectory)
- Deprecated.
parseArtists
protected void parseArtists(java.io.File artistDirectory)
- Deprecated.
set
public void set(Graph g)
- Deprecated.
- Description copied from interface:
DataAquisition
- Set the graph to be populated by this object
- Specified by:
set
in interface DataAquisition
- Parameters:
g
- graph to be created for analysis
get
public Graph get()
- Deprecated.
- Description copied from interface:
DataAquisition
- Obtain a reference to the graph this object holds
- Specified by:
get
in interface DataAquisition
- Returns:
- graph created by this object
cancel
public void cancel()
- Deprecated.
- Description copied from interface:
DataAquisition
- Stop all collection at the end of the next entity (file, web-page, etc.)
- Specified by:
cancel
in interface DataAquisition
getOutputType
public java.util.List<IODescriptor> getOutputType()
- Deprecated.
- Description copied from interface:
Component
- The output type describes all the different kinds of graph objects that
are created during the execution of this algorithm. The result is only
guaranteed to be fixed if structural parameters are not modified. This is
an empty array if there is no output.
- Specified by:
getOutputType
in interface Component
- Returns:
- IODescriptor array for this component
- See Also:
IODescriptor
getInputType
public java.util.List<IODescriptor> getInputType()
- Deprecated.
- Description copied from interface:
Component
- The input type describes all the different kinds of graph objects that
are utilized (and hence required) by this object. This result is only
guaranteed to be fixed if structural parameters are not modified. This
is an empty array if there is no input.
- Specified by:
getInputType
in interface Component
- Returns:
- IODescriptor array for this component
- See Also:
IODescriptor
getParameter
public Properties getParameter()
- Deprecated.
- Description copied from interface:
Component
- List of all parameters this component accepts. Each parameter also has a
distinct key-name used when initializing the object using the init method.
If there are no parameters, null is returned.
- Specified by:
getParameter
in interface Component
- Returns:
- read-only array of Parameters
getParameter
public Parameter getParameter(java.lang.String param)
- Deprecated.
- Description copied from interface:
Component
- Returns the specific parameter identified by its key-name. If no
parameter is found with this key-name, null is returned.
- Specified by:
getParameter
in interface Component
- Parameters:
param
- key-name of the parameter
- Returns:
- named parameter
init
public void init(Properties map)
- Deprecated.
- Initializes the data aqusition algorithm. There are no inputs. The
outputs are as follows:
- input: None
- output:
- user
- Type: Actor
- Description: LastFM user account
- Age
- Type: Actor Property
- Mode: user
- Description: Age of the LastFM user
- Avatar
- Type: Actor Property
- Mode: user
- Description: Avatar
- Country
- Type: Actor Property
- Mode: user
- Description: LastFM user's country of origin
- Gender
- Type: Actor Property
- Mode: user
- Description: Sex of the LastFM user
- Icon
- Type: Actor Property
- Mode: user
- Description: LastFM user's picture
- MBox
- Type: Actor Property
- Mode: user
- Description: Email address of the LastFM user
- PlayCount
- Type: Actor Property
- Mode: user
- Description: Total number of times the LastFM user has uploaded to LastFM that they have listened to a piece of music.
- name
- Type: Actor Property
- Mode: user
- Description: Real name of the user
- Cluster ID
- Type: Actor Property
- Mode: user
- Description: Automated group assignment of the LastFM user
- User ID
- Type: Actor Property
- Mode: user
- Description: LastFM username
- Artist
- Type: Actor
- Description: LastFM artist
- Tag
- Type: Actor
- Description: Freeform metadata tag
- URL
- Type: Actor Property
- Mode: tag
- Description: LastFM homepage of this tag
- ArtistTag
- Type: Link
- Description: Link representing the assigning of a metadata element to an artist in general
- UserTag
- Type: Link
- Description: Link representing assignment of a tag to some artist by this user
- UserArtistTag
- Type: Link
- Description: Link representing metadata tags given by a user to an artist
- Tags
- Type: Link Property
- Mode: UserArtistTags
- Description: List of Strings representing the tags used by a user for a particular artist
- ListensTo
- Type: Link
- Mode: user
- Description: Link representing the number of times a user has listened to an artist.
- Similarity
- Type: Link
- Description: Link for similarity between nodes
- Friend
- Type: Link
- Description: Link representing a friendship declaration between users
- Specified by:
init
in interface Component
- Parameters:
map
- properties to load - null is permitted.
prototype
public CrawlLastFM prototype()
- Deprecated.
- Description copied from interface:
Component
- All Components implement the prototype pattern. The new parameter has no common resources
to the original that are not static resources o the class.
- Specified by:
prototype
in interface Component
- Returns:
- default-parameter version of the same class as the original.