Plugins to fetch information from websites

These plugins are used to fetch information concerning items in default collections from websites. It will fill the fields with all the information it could get. This page describes how to create such a plugin. Please keep in mind the Coding conventions when writing such a plugin.

Preparation

| Top |

The easiest way to begin a new plugin is to copy an existing one. They are found in lib/gcstar/GCPlugins/GCxxx, where xxx is the kind of collection the plugins concerns. As an example, plugins to fetch information for movies are in lib/gcstar/GCPlugins/GCfilms.

You could also use a template provided with GCstar sources. It’s GCSiteTemplate.pm in templates directory.

You should rename your file with something explicit. But the 2 first letters should always be GC and the suffix has to be .pm.

The first line contains something like that:

package GCPlugins::GCxxx::GCyyy;

Change yyy so it will matches your file name (without its suffix). A few lines below, there is this text:

package GCPlugins::GCxxx::GCPluginyyy;

Don’t change GCPlugin in the last part, but replace yyy with the same value as previously.

Interface

| Top |

Here is the list of methods your plugin should implement. It’s done in an object-oriented way, meaning that the first parameter of this method will always be a reference to an object. This object is an instance of your package that will have to do the work. The same instance could be used during a user session, but there is no guarantee about that. So your package should be ready in any case. That means you are supposed to clear internal values that should be resetted between 2 fetches, and to avoid storing values between 2 fetches.

new

Parameters
Package name.
Returns
A blessed reference to the created object (it should be a hash reference). You need to use the constructor for base class, GCPlugins::GCxxx::GCxxxPluginsBase.
Description
This is the constructor of the plugin object. You may intialize here any internal values you would like. You are also supposed to initialize a field (hasField) containing a reference to a hash where keys are name of fields a search will return. These fields could be found in .gcm file describing collection between results tags (in collection/options/fields). The associated value is 1 when the plugin returns a value for this field during a seach. It should be 0 if it doesn’t.

getName

Parameters
None.
Returns
A string containing the plugin name.
Description
The name it returns is the one that will be displayed in the application. So it should be explicit enough and also unique to avoid confusion.

getAuthor

Parameters
None.
Returns
A string containing the author’s name.
Description
You could return here your real name or nickname. It will be displayed in the application when the user select a plugin.

getLang

Parameters
None.
Returns
A string containing the website language.
Description
This is useful for users to know what plugin they could use. This value is also used internally by GCstar to automatically select plugins using same languages as the user one. So this has to be a 2 letters language code. If the language is already supported by GCstar, make sure you use the same code as the one used for the translation.

getCharset

Parameters
None.
Returns
A string containing the website charset.
Description
You may find this information in the header of the pages the website contains. If you don’t implement this method, default value will be ISO-8859-1

getSearchUrl

Parameters
Text to search.
Returns
URL of the page containing search results. Optionnaly the post parameters.
Description
This method should build the full URL of the page containing the results of the search for the user’s query. The value it gets has already been prepared to be directly used in a URL without any conversion (i.e. all special characters have been escaped).
If the website use GET method for the search form, everything will be contained in the URL. But some websites may use POST method. Then the parameters should be returned also with the URL. They should be contained in a reference to an array that contains keys and values. Here is an example of a getSearchUrl for a website that uses POST:

sub getSearchUrl
{
    my ($self, $word) = @_;
    return ('http://www.example.com/search.php', ['query' => $word, 'type' => 'movies']);
}

In this example, the website will get 2 parameters, query and type with corresponding values.

getItemUrl

Parameters
The URL of an item or none.
Returns
Full URL of the page describing an item.
Description
This method could be called in 2 different contexts. During searches, the results could contain only a relative URL to the page containing the description. Then this method will have to prepend the website address so it will returns a full URL. But this will also called when using drag and drop to a URL from a web browser to GCstar. The application will call this method with no parameter to try to match the URL that has been dropped with the plugin one. Then it will be able to know which plugin should be used to parse the page.

preProcess

Parameters
Full content of the page.
Returns
Modified content of the page.
Description
Before parsing a page (see next section), you could want to do some changes in the content (such as removing unused parts or fixing some tag problems). This could be done in this method. You may also test $self->{parsingList} as described later.

Parsing the pages

| Top |

The plugins are some event-based HTML parsers. That means they will go through an HTML page and some functions will be called when some events occured.

When a tag (such as <p> or <a href=...>) starts, the method called is start. When there is some textual content, the method called is text. When a tag ends, the method called is end. Refer to documentation about HTML::Parser for more information as it is the base package of your package (providing you didn’t remove the use base clause during preparation). We are supposing here you got the reference to the current object in a variable $self.

Inside these methods, there are 2 main blocks depending on the value of $self->{parsingList}. If this is a true value, that means we are parsing a results page (the list of items that match a query). If this is a false value, we are parsing the information for a given item.

When parsing search results, you have to fill an array named $self->{itemsList}. Each item of this array is a reference to a hash. Each key of a hash is the name of the field (the same that the ones in $self->{hasField} initialized in new method). The values are obviously the ones that have been extracted from the parsed page.

When parsing item description, the values have to be stored in $self->{curInfo}->{fieldName}, where fieldName is the same name as the one in the .gcm file.

Inform webmasters

| Top |

While GCstar only does the same operations a web browser would do, it is nicer to inform the websmaster what you are doing. Just look for the contact information on the website you are writing a plugin to, and send them a mail to inform them about this. You may send them a link to the page with Information for webmasters.

As what GCstar is could be unclear, you probably will have to insist on the fact that GCstar is only for personal use. Also, the users will always know from where they are fetching the information. The goal of this application is in no way to hide what website is used as they are doing a useful and great work.

 
en/websites_plugins.txt · Last modified: 20/11/2007 06:17 by Tian



Should you have a problem using GCstar, you can open a bug report or request some support on GCstar forums.