Plugins to fetch information from websites
These plugins are used to fetch information concerning items in default collections from websites. It will fill the fields with all the information it could get. This page describes how to create such a plugin. Please keep in mind the Coding conventions when writing such a plugin.
Preparation
The easiest way to begin a new plugin is to copy an existing one. They are found in lib/gcstar/GCPlugins/GCxxx, where xxx is the kind of collection the plugins concerns. As an example, plugins to fetch information for movies are in lib/gcstar/GCPlugins/GCfilms.
You could also use a template provided with GCstar sources. It’s GCSiteTemplate.pm in templates directory.
You should rename your file with something explicit. But the 2 first letters should always be GC and the suffix has to be .pm.
The first line contains something like that:
package GCPlugins::GCxxx::GCyyy;
Change yyy so it will matches your file name (without its suffix). A few lines below, there is this text:
package GCPlugins::GCxxx::GCPluginyyy;
Don’t change GCPlugin in the last part, but replace yyy with the same value as previously.
Interface
Here is the list of methods your plugin should implement. It’s done in an object-oriented way, meaning that the first parameter of this method will always be a reference to an object. This object is an instance of your package that will have to do the work. The same instance could be used during a user session, but there is no guarantee about that. So your package should be ready in any case. That means you are supposed to clear internal values that should be resetted between 2 fetches, and to avoid storing values between 2 fetches.
new
- Parameters
- Package name.
- Returns
- A blessed reference to the created object (it should be a hash reference). You need to use the constructor for base class, GCPlugins::GCxxx::GCxxxPluginsBase.
- Description
- This is the constructor of the plugin object. You may intialize here any internal values you would like. You are also supposed to initialize a field (hasField) containing a reference to a hash where keys are name of fields a search will return. These fields could be found in .gcm file describing collection between results tags (in collection/options/fields). The associated value is 1 when the plugin returns a value for this field during a seach. It should be 0 if it doesn’t.
getName
- Parameters
- None.
- Returns
- A string containing the plugin name.
- Description
- The name it returns is the one that will be displayed in the application. So it should be explicit enough and also unique to avoid confusion.
getAuthor
- Parameters
- None.
- Returns
- A string containing the author’s name.
- Description
- You could return here your real name or nickname. It will be displayed in the application when the user select a plugin.
getLang
- Parameters
- None.
- Returns
- A string containing the website language.
- Description
- This is useful for users to know what plugin they could use. This value is also used internally by GCstar to automatically select plugins using same languages as the user one. So this has to be a 2 letters language code. If the language is already supported by GCstar, make sure you use the same code as the one used for the translation.
getCharset
- Parameters
- None.
- Returns
- A string containing the website charset.
- Description
- You may find this information in the header of the pages the website contains. If you don’t implement this method, default value will be ISO-8859-1
getSearchUrl
- Parameters
- Text to search.
- Returns
- URL of the page containing search results. Optionnaly the post parameters.
- Description
- This method should build the full URL of the page containing the results of the search for the user’s query. The value it gets has already been prepared to be directly used in a URL without any conversion (i.e. all special characters have been escaped).
sub getSearchUrl
{
my ($self, $word) = @_;
return ('http://www.example.com/search.php', ['query' => $word, 'type' => 'movies']);
}
In this example, the website will get 2 parameters, query and type with corresponding values.
getItemUrl
- Parameters
- The URL of an item or none.
- Returns
- Full URL of the page describing an item.
- Description
- This method could be called in 2 different contexts. During searches, the results could contain only a relative URL to the page containing the description. Then this method will have to prepend the website address so it will returns a full URL. But this will also called when using drag and drop to a URL from a web browser to GCstar. The application will call this method with no parameter to try to match the URL that has been dropped with the plugin one. Then it will be able to know which plugin should be used to parse the page.
preProcess
- Parameters
- Full content of the page.
- Returns
- Modified content of the page.
- Description
- Before parsing a page (see next section), you could want to do some changes in the content (such as removing unused parts or fixing some tag problems). This could be done in this method. You may also test $self->{parsingList} as described later.
Parsing the pages
The plugins are some event-based HTML parsers. That means they will go through an HTML page and some functions will be called when some events occured.
When a tag (such as <p> or <a href=...>) starts, the method called is start. When there is some textual content, the method called is text. When a tag ends, the method called is end. Refer to documentation about HTML::Parser for more information as it is the base package of your package (providing you didn’t remove the use base clause during preparation). We are supposing here you got the reference to the current object in a variable $self.
Inside these methods, there are 2 main blocks depending on the value of $self->{parsingList}. If this is a true value, that means we are parsing a results page (the list of items that match a query). If this is a false value, we are parsing the information for a given item.
When parsing search results, you have to fill an array named $self->{itemsList}. Each item of this array is a reference to a hash. Each key of a hash is the name of the field (the same that the ones in $self->{hasField} initialized in new method). The values are obviously the ones that have been extracted from the parsed page.
When parsing item description, the values have to be stored in $self->{curInfo}->{fieldName}, where fieldName is the same name as the one in the .gcm file.
Inform webmasters
While GCstar only does the same operations a web browser would do, it is nicer to inform the websmaster what you are doing. Just look for the contact information on the website you are writing a plugin to, and send them a mail to inform them about this. You may send them a link to the page with Information for webmasters.
As what GCstar is could be unclear, you probably will have to insist on the fact that GCstar is only for personal use. Also, the users will always know from where they are fetching the information. The goal of this application is in no way to hide what website is used as they are doing a useful and great work.
Should you have a problem using GCstar, you can open a bug report or request some support on GCstar forums.