Despite the additional complexity of the RDF attributes, the methods for creating RSS 1.0 feeds are similar to those used to create RSS 0.9x feeds (discussed in Chapter 4).
The XML::RSS module we used in Chapter 4 also works for RSS 1.0, with a few changes to the scripts. Example 6-4 shows a sample script to produce the feed shown in Example 6-5.
#!/usr/local/bin/perl -w use XML::RSS; my $rss = new XML::RSS (version => '1.0'); $rss->channel(title => "The Title of the Feed", link => "http://www.oreilly.com/example/", description => "The description of the Feed", dc => { date => "2000-08-23T07:00+00:00", subject => "Linux Software", creator => "scoop@freshmeat.net", publisher => "scoop@freshmeat.net", rights => "Copyright 1999, Freshmeat.net", language => "en-us", }, ); $rss->image(title => "Oreilly", url => "http://meerkat.oreillynet.com/icons/meerkat-powered.jpg", link => "http://www.oreilly.com/example/", dc => { creator => "G. Raphics (graphics at freshmeat.net)", }, ); $rss->textinput(title => "Search", description => "Search the site", name => "query", link => "http://www.oreilly.com/example/search.cgi" ); $rss->add_item( title => "Example Entry 1", link => "http://www.oreilly.com/example/entry1", description => 'blah blah', dc => { subject => "Software", }, ); $rss->add_item( title => "Example Entry 2", link => "http://www.oreilly.com/example/entry2", description => 'blah blah' ); $rss->add_item( title => "Example Entry 3", link => "http://www.oreilly.com/example/entry3", description => 'blah blah' ); $rss->save("example.rdf");
<?xml version="1.0"?> <rdf:RDF xmlns="http://purl.org/rss/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" > <channel rdf:about="http://www.oreilly.com/example/example.rdf"> <title>The title of the feed</title> <link>http://www.oreilly.com/example/</link> <description>The description of the feed</description> <dc:language>en-us</language> <dc:date>2000-08-23T07:00+00:00</dc:date> <dc:subject>Linux Software</dc:subject> <dc:creator>scoop@freshmeat.net</dc:creator> <dc:publisher>scoop@freshmeat.net</dc:publisher> <dc:rights>Copyright 1999, Freshmeat.net</dc:rights> <image rdf:resource="http://meerkat.oreillynet.com/icons/meerkat-powered.jpg"/ > <textinput rdf:resource="http://www.oreilly.com/example/search.cgi"/> <items> <rdf:Seq> <rdf:li rdf:resource="http://www.oreilly.com/example/entry1" /> <rdf:li rdf:resource="http://www.oreilly.com/example/entry2" /> <rdf:li rdf:resource="http://www.oreilly.com/example/entry3" /> </rdf:Seq> </items> </channel> <image rdf:about="http://meerkat.oreillynet.com/icons/meerkat-powered.jpg"> <title>Oreilly</title> <url>http://meerkat.oreillynet.com/icons/meerkat-powered.jpg</url> <link>http://www.oreilly.com/example</link> <dc:creator>G.Raphics (graphics at freshmeat.net)</dc:creator> </image> <textinput rdf:about="http://www.oreilly.com/example/search.cgi"> <description>Search the Site</description> <name>query</name> <link>http://www.oreilly.com/example/search.cgi</link> </textinput> <item rdf:about="http://www.oreilly.com/example/entry1"> <title>Example Entry 1</title> <description>blah blah</description> <link>http://www.oreilly.com/example/entry1</link> <dc:subject>Software</dc:subject> </item> <item rdf:about="http://www.oreilly.com/example/entry2"> <title>Las Ramblings</title> <description>blah blah</description> <link>http://www.oreilly.com/example/entry2</link> </item> <item rdf:about="http://www.oreilly.com/example/entry3"> <title>Example Entry 3</title> <description>blah blah</description> <link>http://www.oreilly.com/example/entry3</link> </item>
The differences between making RSS 0.91 and RSS 1.0 with XML::RSS are slight. Just make sure you declare the correct version, like so:
my $rss = new XML::RSS (version => '1.0');
The module takes care of itself, for the most part. If you use other namespaces, you must surround them with their namespace prefix. In this section, the script adds six elements that are part of the Dublin Core module into the channel section of the feed:
$rss->channel(title => "The Title of the Feed", link => "http://www.oreilly.com/example/", description => "The description of the Feed", dc => { date => "2000-08-23T07:00+00:00", subject => "Linux Software", }, );
XML::RSS comes with built-in support for the Dublin Core, Syndication, and Taxonomy modules. You can easily add support for any other module:
$rss->add_module(prefix=>'my', uri=>'http://purl.org/my/rss/module/');
This line does two things. First, it makes the module add the correct namespace declaration into the root element of the document (in this case, it adds the line xmlns:my=http://purl.org/my/rss/module/, but you should replace the prefix and the URI with the correct ones for your module), and second, it allows you to use the same syntax as the preceding Dublin Core example to add your elements to the feed:
$rss->channel(title => "The Title of the Feed", link => "http://www.oreilly.com/example/", description => "The description of the Feed", dc => { date => "2000-08-23T07:00+00:00" subject => "Software", }, my => { element => 'value', }, );
The rest of the script is identical to the RSS 0.91 creation script using the same module.
To put what we've discussed to use, we can now create an RSS 1.0 feed from an external data source. One increasingly popular source, in addition to the Google API discussed in Chapter 4, is provided by Amazon.com, the online retailer. In July 2002, Amazon.com released a set of web services that allow developers to query their database using either SOAP, or XML over HTTP (also known as REST).
We used SOAP in Chapter 4, so this time we'll look at XML over HTTP. By passing a correctly formed URL to the Amazon.com server, we receive in return an XML document containing details of the books for which we've searched. For example, passing the URL:
will return an XML document, which begins like this:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE ProductInfo PUBLIC "-//Amazon.com //DTD Amazon Product Info//EN" "http:// xml.amazon.com/schemas/dev-lite.dtd"> <ProductInfo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi: noNamespaceSchemaLocation="http://xml.amazon.com/schemas/dev-lite.xsd"> <Details url="http://www.amazon.com/exec/obidos/redirect?tag=webservices- 20%26creative=D3PEW6MKWJIULE%26camp=2025%26link_code=xm2%26path=ASIN/0140547444"> <Asin>0140547444</Asin> <ProductName>The Day the Teacher Went Bananas</ProductName> <Catalog>Book</Catalog> <Authors> <Author>James Howe</Author> <Author>Lillian Hoban</Author> </Authors> <ReleaseDate>August, 1992</ReleaseDate> <Manufacturer>Puffin</Manufacturer> <ImageUrlSmall> http://images.amazon.com/images/P/0140547444.01.THUMBZZZ.jpg </ImageUrlSmall> <ImageUrlMedium> http://images.amazon.com/images/P/0140547444.01.MZZZZZZZ.jpg </ImageUrlMedium> <ImageUrlLarge> http://images.amazon.com/images/P/0140547444.01.LZZZZZZZ.jpg </ImageUrlLarge> <ListPrice>$5.99</ListPrice> <OurPrice>$5.99</OurPrice> <UsedPrice>$3.35</UsedPrice> </Details> . . . </ProductInfo>
Note that I have removed my developer's token from the URL (get your own from Amazon.com) and have truncated all but one Details element for the sake of space.
Looking at the resultant XML, we can immediately see potential ways to map Amazon.com's ProductInfo document standard over to RSS 1.0. Here's the document again, with the values explained within:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE ProductInfo PUBLIC "-//Amazon.com //DTD Amazon Product Info//EN" "http:// xml.amazon.com/schemas/dev-lite.dtd"> <ProductInfo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi: noNamespaceSchemaLocation="http://xml.amazon.com/schemas/dev-lite.xsd"> <Details url="URL TO BOOK'S PAGE ON AMAZON.COM"> <Asin>THE BOOK'S ASIN NUMBER</Asin> <ProductName>THE NAME OF THE BOOK</ProductName> <Catalog>THE CATALOG ON AMAZON THE SEARCH RESULT CAME FROM</Catalog> <Authors> <Author>AUTHOR NAME</Author> <Author>AUTHOR NAME</Author> </Authors> <ReleaseDate>RELEASE DATE (Month, Year)</ReleaseDate> <Manufacturer>THE NAME OF THE PUBLISHER</Manufacturer> <ImageUrlSmall>URL TO THUMBNAIL IMAGE</ImageUrlSmall> <ImageUrlMedium>URL TO MEDIUM SIZED IMAGE</ImageUrlMedium> <ImageUrlLarge>URL TO LARGE IMAGE</ImageUrlLarge> <ListPrice>THE LIST PRICE OF THE BOOK ($dd.cc)</ListPrice> <OurPrice>AMAZON'S PRICE OF THE BOOK ($dd.cc)</OurPrice> <UsedPrice>AMAZON'S PRICE FOR THE BOOK SECOND HAND ($dd.cc)</UsedPrice> </Details> . . . </ProductInfo>
This document would map nicely to an RSS 1.0 document, like so:
<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns="http://purl.org/rss/1.0/"> <channel rdf:about="SEARCH URL"> <title>Amazon Search Results for XXX</title> <link>LINK TO WEBSITE VERSION OF SEARCH</link> <description>Amazon Search Results for XXX</description> <dc:publisher>Amazon Inc</dc:publisher> <items> <rdf:Seq> <rdf:li resource="URI OF FIRST BOOK" /> <rdf:li resource="URI OF SECOND BOOK" /> <rdf:li resource="URI OF THIRD BOOK" /> . . . </rdf:Seq> </items> </channel> <item rdf:about="URI OF BOOK"> <title>TITLE OF BOOK</title> <link>URL of BOOK WITHIN AMAZON SITE</link> <description>TITLE OF THE BOOK, by AUTHOR, priced PRICE</description> <dc:publisher>THE NAME OF THE PUBLISHER</dc:publisher> <dc:creator>AUTHOR NAME</dc:creator> <dc:date>RELEASE DATE</dc:date> </item> </rdf:RDF>
You can see that with some imagination we can carry all the information over from the Amazon.com document to an RSS 1.0 equivalent. In this example, we have used the Dublin Core module to provide us with some more useful elements. Other data sources might require other modules. Chapter 7 goes into these in detail.
We can convert the Amazon.com XML feed into RSS 1.0 by loading the XML, parsing it, retrieving the data, and creating a new RSS document with the resulting variables.
In Chapter 4, we retrieved data from Google using a SOAP query. In this example, we'll leave out SOAP entirely and simply retrieve the XML as we would anything else on the Web. The script will take two parameters, as before ? the developer's token and the search term.
We are also going to point to specific URLs on the Amazon.com site, so we need to work out how Amazon.com creates these. At the time of this writing, the search box on their site is an HTTP POST request ? the parameters do not show up in the resulting URL. Thus, we can't make the link element of the channel correspond directly to the search that creates the feed. For simplicity's sake, we'll make it point to http://www.amazon.com.
The item link element can point directly to a URL, however. The ASIN number (an Amazon.com-specific coding, which for books is the same as the ISBN standard) we retrieve from the Amazon.com XML is used like this:
http://www.amazon.com/exec/obidos/ASIN/ASIN-NUMBER-HERE/
Furthermore, whereas the Perl SOAP module we used returns the result as an array, here we retrieve the XML directly. It needs to be parsed first to make the RSS creation easier. We'll use XML::Simple, which is ? oddly enough ? a simple XML parser.
Example 6-6 shows the entire source of amazonrss.cgi. This is run from the browser, and takes two parameters: q equals the search term, and t equals the Amazon.com developer's token. For example:
http://URL/amazonrss.cgi?q=QUERY&t=TOKEN
#!/usr/bin/perl -w use strict; use XML::RSS; use LWP::Simple qw(get); use XML::Simple; use CGI qw(:all); # Set up the query term from the cgi input my $query = param("q"); my $token = param("t"); # Run the search my $result = get('http://xml.amazon.com/onca/xml?v=1.0&t=webservices-20&dev-t='.$token.' &KeywordSearch='.$query.'&mode=books&type=lite&page=1&f=xml'); # Parse the XML my $xml = XMLin($result); # Create the new RSS object my $rss = new XML::RSS (version => '1.0'); # Add in the RSS channel data $rss->channel( title => "Amazon Search for $query", link => "http://www.amazon.com", description => "Amazon search results for $query", dc => { publisher => "Amazon.com", }, ); # Create each of the items foreach my $element (@{$xml->{'Details'}}) { $rss->add_item( title => $element->{'ProductName'}, link => "http://www.amazon.com/exec/obidos/ASIN/".$element->{'Asin'}, dc => { publisher => "Amazon.com", creator => $element->{'Authors'}->{'Author'}, date => $element->{'ReleaseDate'}, }, ); } print "Content-type: application/xml+rss\n\n"; print $rss->as_string;