6.1 Walking Through an RSS 1.0 document

At first glance, RSS 1.0 can look very complicated indeed. It isn't really, and breaking an example down into chunks can help a great deal, so that's what we're going to do now.

In sidebars throughout this section, we'll also examine RDF in XML syntax in general. An RSS 1.0 document is also a valid RDF document (though the reverse is not always true, and you must not forget that RDF has many different ways of being written).

Example 6-1 is a simple RSS 1.0 feed with one item, an image, and a text-input section. The first line includes the standard XML declaration, declaring the document's encoding to be UTF-8:

<?xml version="1.0" encoding="utf-8"?>

The root element (the first line) is also the place we declare the additional namespaces that are used in the document, telling the parser that we are also going to be using the vocabularies represented by these certain URIs. The required line already declares the namespace for all the core elements of RSS 1.0 ? the elements that appear without a colon ? and the namespace for RDF:

<rdf:RDF xmlns="http://purl.org/rss/1.0/"

Namespaces are represented by URIs. Nothing special needs to be at the namespace's URI (though by convention there is usually some documentation about the module): the only requirement is that the URI and the namespace are unique to each other. The syntax of a namespace declaration is simple and can be read aloud for greater understanding. For example, the line xmlns:dc="http://purl.org/dc/elements/1.1/ is read as follows: "the XML namespace dc is associated with the URI http://purl.org/dc/elements/1.1."

Every namespace used in the RSS 1.0 document must be declared in the root element. For documents with many namespaces, this element can look very untidy, but a judicious application of spaces and new lines can make it easier to read.

Reformatting for Fun and Profit

RSS 1.0 documents can look absolutely forbidding. The first few lines can contain more colons, URIs, and weird acronyms than a bad day on the O'Reilly Network.

So, it can help a great deal if you load the suspect file into a text editor and have a good go at reformatting it before you try to work out what is going on. Once you've done this to a few feeds, you do get to see the underlying structure, and a good concept of RSS 1.0's inner workings will hopefully appear in your head.

Now let's look at the channel element:

<channel rdf:about="http://meerkat.oreillynet.com/?_fl=rss1.0">
  <description>Meerkat: An Open Wire Service</description>
  <dc:publisher>The O'Reilly Network</dc:publisher>
  <dc:creator>Rael Dornfest (mailto:rael@oreilly.com)</dc:creator>
  <dc:rights>Copyright &#169; 2000 O'Reilly &amp; Associates, Inc.</dc:rights>

In the first half of the channel element, we start to see the main differences in structure between RSS 0.9x and RSS 1.0. First, every top-level element (channel, item, image, text input) has an rdf:about attribute. This denotes the URI of that resource in the scope of RDF.

Second, we start to see subelements of the channel using namespaces. In this example, we see dc: and sy: (the Dublin Core and Syndication modules) in use.

Next comes a major departure from RSS 0.9x. The image and textinput elements of RSS 1.0 are not contained within the channel. Rather, channel contains a pointer to their objects, which are elsewhere in the RSS 1.0 document, at the same level as channel. The pointers are RDF notation, using the rdf:resource attribute:

<image rdf:resource="http://meerkat.oreillynet.com/icons/meerkat-powered.jpg" />
<textinput rdf:resource="http://meerkat.oreillynet.com" />

In the same way, within RSS 1.0 (unlike RSS 0.9x), channel does not contain any item elements. It does, however, contain an items element within which sits an RDF list of all the item elements that exist within the whole document. Again, these are simply pointers that provide the correct RDF descriptions:

      <rdf:li rdf:resource="http://c.moreover.com/click/here.pl?r123" />

Note that the channel element is closed here. Unlike RSS 0.9x, in RSS 1.0 channel does not encompass the entire document. Once it has defined its own metadata and pointed to the items, image, and textinput objects, its job is done.

The image, textinput, and item elements are similar to the RSS 0.9x equivalents, differing only in that they declare the rdf:about attribute, as previously discussed, and allow for additional namespaced subelements from the optional modules:

<image rdf:about="http://meerkat.oreillynet.com/icons/meerkat-powered.jpg">
  <title>Meerkat Powered!</title>
<textinput rdf:about="http://meerkat.oreillynet.com">
  <title>Search Meerkat</title>
  <description>Search Meerkat's RSS Database...</description>
<item rdf:about="http://c.moreover.com/click/here.pl?r123">
  <title>XML: A Disruptive Technology</title>
  <dc:description>This the description of the article</dc:description>
  <dc:publisher>The O'Reilly Network</dc:publisher>
  <dc:creator>Simon St.Laurent (mailto:simonstl@simonstl.com)</dc:creator>
  <dc:rights>Copyright &#169; 2000 O'Reilly &amp; Associates, Inc.</dc:rights>