13.3 A Detailed Look at the Specification

RSS is first and foremost valid RDF, requiring the enclosing RDF element. Other RSS-specific elements that are required are channel, title, link, and one or more item elements. The remaining RSS elements are optional.

The RSS RDFS can be found at http://web.resource.org/rss/1.0/schema.rdf/. Examining this, you'll see that many of the RSS properties discussed in this section are actually subproperties of related DC properties, such as TITLE, URL, DESCRIPTION, and so on.

Certain allowable features of RDF are restricted within RSS, primarily to simplify the tool builder's task. For instance, at the time of this writing, repeating properties (or subelements as they are termed in the RSS spec), which are allowed in RDF, are restricted in RSS. This restriction means that you couldn't list multiple subelements of the higher-level item element, such as multiple DC subject entries (which would be a naturally occurring repetitive element). However, the RSS Working Group is working toward removing this restriction or at least having each RSS module writer explicitly specify where repeating properties are allowed.

Another RSS-specific restriction is that each higher-level element must have an rdf:about attribute (as shown in Example 13-1 for the item and channel elements), and the URI contained in this attribute must follow URL naming conventions (i.e., be an http, ftp, mailto, etc. type of URI). Remaining restrictions are based on the RSS elements, as discussed in the next several sections.

13.3.1 channel

The channel element surrounds the data being described in the document. It's equivalent to an RDF typed node and features starting and ending tags. The only required attribute is rdf:about, containing the URL of the resource being described:

<?xml version="1.0"?>

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://
purl.org/dc/elements/1.1/" xmlns:sy="http://purl.o
rg/rss/1.0/modules/syndication/" xmlns="http://purl.org/rss/1.0/">

<channel rdf:about="http://weblog.burningbird.net/">
...
</channel>
...
</rdf:RDF>

You could extend the channel element with new attributes, and the element should still validate as both RDF and RSS. However, a better approach would be to check whether one of the RSS modules has the data elements you need to describe your data and to use that module instead. If not, you may want to consider submitting your own recommended modules (as described later in Section 13.4).

13.3.2 title, link, and description

The title, link, and description elements are all required subelements of channel. The RSS specification has a recommended length for each: 40 characters or fewer for title, and 500 characters or fewer for link and description.

<?xml version="1.0"?>

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://
purl.org/dc/elements/1.1/" xmlns:sy="http://purl.o
rg/rss/1.0/modules/syndication/" xmlns="http://purl.org/rss/1.0/">

<channel rdf:about="http://weblog.burningbird.net/">
<title>Burningbird</title>
<link>http://weblog.burningbird.net/</link>
<description></description>
...
</channel>
...
</rdf:RDF>

The elements are required, but they may contain no data, as the description element in this example demonstrates.

All three elements are PCDATA, which means the character data is parsed for things such as named entities (i.e., &lt; for < and so on), but the data cannot contain child elements. This also means you cannot use markup in these elements. When a suggestion was made about including XHTML or some other form of XML within the title element, the RSS Working Group strongly recommended against this technique.

The link element doesn't necessarily repeat the URI of the item being described. Instead, it contains the URL of the HTML that contains the rendering of the item begin described (whether this is the channel, the image, or a specific item). For the vast majority of uses of RSS, the link duplicates the URL given as a URI in the rdf:about attribute. However, this can sometimes differ. For instance, a URI for a site might be:

http://somesite.com

but the link to the actual material might be:

http://www.somesite.com/index.html

The site may choose to use a different URI to represent the site contents, reflecting the independence of the URI from the actual URL. Why? Could be because the site wants to change the channel site at some point, perhaps linking it to:

http://channel.somesite.com/index.php

but the site URI remains the same, and therefore consistent.

title, link, and description are required subelements of item, textinput, and image, in addition to being subelements of channel.

13.3.3 items

The items element contains an rdf:Seq container, which contains a reference to each item described in the RSS document:

<?xml version="1.0"?>

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://
purl.org/dc/elements/1.1/" xmlns:sy="http://purl.o
rg/rss/1.0/modules/syndication/" xmlns="http://purl.org/rss/1.0/">

<channel rdf:about="http://weblog.burningbird.net/">
<title>Burningbird</title>
<link>http://weblog.burningbird.net/</link>
<description></description>

<items>
<rdf:Seq>
<rdf:li rdf:resource="http://weblog.burningbird.net/archives/000472.php" />
<rdf:li rdf:resource="http://weblog.burningbird.net/archives/000471.php" />
</rdf:Seq>
</items>

</channel>
...
</rdf:RDF>

The rdf:Seq container is used for items to maintain the order of how the items are processed. News aggregators usually display news in reverse chronological order?latest news displayed at the top of the list?and the sequence helps maintain this order.

The RSS specification requires that there be at least one item listed in the items container. Though there is no upper limit specified for RSS 1.0, it's a good idea to restrict the number to 15 or fewer, to ensure backward compatibility with RSS .9x.

During recent discussions about the possibility of simplifying RSS 1.0, one specific area was targeted: items. Why? Most RSS generation tools and aggregators have problems with the RDF container. It wasn't so much that the concept of container was difficult, as it was having to list the items out first within items and then process the items again as individual item elements. However, without repeating properties, there was no way of managing multiply repeating elements except to use the container.

Several suggestions have been made to simplify the syntax, including to support repeating properties and to eliminate the use of rdf:Seq. Example 13-2 shows a different RSS 1.0 document; this one utilizes repeating properties and eliminates the use of the container, demonstrating how this could change the appearance of RSS 1.0 files.

Example 13-2. Simplified RDF/RSS syntax
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.
org/dc/elements/1.1/" xmlns="http://purl.org/
rss/1.0/">

<channel rdf:about="http://weblog.burningbird.net/">
<title>Burningbird</title>
<link>http://weblog.burningbird.net/</link>
<description></description>

<hasitem>
<item rdf:about="http://weblog.burningbird.net/archives/000514.php">
<link>http://weblog.burningbird.net/archives/000514.php</link> 
<title>Myths About RDF/RSS</title>
<description>Lots of discussion about the direction that RSS is going to take, 
which I think is good. However, the first thing that happens any time a 
conversation about RSS occurs is people start questioning the use of RDF within the...</
description>
<dc:subject>Technology</dc:subject>
<dc:creator>shelley</dc:creator>
<dc:date>2002-09-06T00:53:16-06:00</dc:date>
</item>
</hasitem>
<hasitem>
<item rdf:about="http;//weblog.burningbird.net/archives/000515.php">
<link>http://weblog.burningbird.net/archives/000515.php</link> 
<title>ThreadNeedle Status</title>
<description>I provided a status on ThreadNeedle at the QuickTopic discussion 
group. I wish I had toys for you to play with, but no such luck. To those who 
were counting on this technology...</description>
<dc:subject>Technology</dc:subject>
<dc:creator>shelley</dc:creator>
<dc:date>2002-09-06T00:19:28-06:00</dc:date>
</item>
</hasitem>
</channel>

</rdf:RDF>

I also created a small PHP program to process the simplified RDF/RSS, shown in Example 13-3. The interesting thing about the code is that it also worked with Userland RSS as well as the original RSS 1.0, the point being that aggregators aren't the tools that have problems with RDF containers?it's the generation end where things get complicated.

Example 13-3. PHP program to process RSS 1.0, RSS 0.9x, and simplified RSS 1.0 content
<?php

$insideitem = false;
$tag = "";
$title = "";
$author = "";
$link = "";
$description = "";

function startElement($parser, $name, $attrs) {
        global $insideitem, $tag, $title, $author, $link, $description;
        if ($insideitem) {
                $tag = $name;
        } elseif ($name == "ITEM") {
                $insideitem = true;
        }
}

function endElement($parser, $name) {
        global $insideitem, $tag, $title, $author, $link, $description;
        if ($name == "ITEM") {
            printf("<p class='%s'>", trim($read));
            printf("<a class='%s' href='%s'><span style='font-weight: bold'>%s</span></a>
",
                        trim($read),trim($link),htmlspecialchars(trim($title)));
            printf("<br />by %s", htmlspecialchars(trim($author)));
            printf("<br />Description: %s", htmlspecialchars(trim($description)));
            printf("</p>");
            $title = "";
            $author = "";
            $link = "";
            $description = "";
            $insideitem = false;
        }
}

function characterData($parser, $data) {
        global $insideitem, $tag, $title, $link, $author, $description;
        if ($insideitem) {
        switch ($tag) {
                case "TITLE":
                $title .= $data;
                break;
                case "DC:CREATOR":
                $author .= $data;
                break;
                case "LINK":
                $link .= $data;
                break;
                case "DESCRIPTION":
                $description .= $data;
                break;
                }
        }
}

$xml_parser = xml_parser_create(  );
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
$fp = fopen("http://weblog.burningbird.net/index.rdf","r")
        or die("Error reading RSS data.");
while ($data = fread($fp, 4096))
        xml_parse($xml_parser, $data, feof($fp))
                or die(sprintf("XML error: %s at line %d",
                        xml_error_string(xml_get_error_code($xml_parser)),
                        xml_get_current_line_number($xml_parser)));
fclose($fp);
xml_parser_free($xml_parser);

?>

At this time, debate on simplification of RSS 1.0 is currently underway within the Working Group.

13.3.4 image

If there is an image associated with the HTML rendering of the item described in the channel, its URL and associated information are described in the image element with the required subelements: title, url, and link:

<image rdf:about="http://weblog.burningbird.net/mm/birdflame.gif">
    <title>Burningbird</title>
    <link>http://weblog.burningbird.net</link>
    <url>http://weblog.burningbird.net/mm/birdflame.gif</url>
  </image>

With this RSS, the item described has a URI of http://weblog.burningbird.net/mm/birdflame.gif, a URL that's the same as the URI, a title of Burningbird (consider it to be equivalent to the ALT tag of an HTML IMG tag), and the URI of the page where the image is displayed.

13.3.5 textinput

The textinput element describes an XHTML textinput form element somehow associated with the RSS, such as a form submitting a subscription to an RSS feed. Though maintained for backward compatibility with RSS 0.9, the RSS Working Group is recommending that this element be deprecated for RSS 1.0?a wise decision in my opinion.

The textinput element doesn't provide useful information about the item being described, and its meaning is overloaded, as is mentioned in the RSS specification. For instance, is the element used to describe a form element to subscribe to a feed? Or is it being used for search? In addition, form elements for processing RSS data are inappropriate embedded within the data itself. This is equivalent to embedding an application form within the data the form accesses in an Oracle database.

However, if you do see the textinput element used, it requires title, description, link, and name subelements. The name subelement is unique to textinput and contains the XHTML form element's name.

13.3.6 item

RSS provides a method for describing groups of related items; each item within the specification is documented with the item tag. This tag is the key element, the heart and soul if you will, of RSS.

The required subelements for item are title, description, and link. Additional elements can be added using RSS modules, but these three subelements must be present for the RSS to validate as RSS:

<item rdf:about="http://weblog.burningbird.net/archives/000472.php">
<title>Serendipity, all over again</title>
<description>When I wrote the previous posting, &quot;How Green is my 
Valley&quot;, I referenced both my old hometown, Kettle Falls,
Washington, and a posting by Loren, otherwise known as In a Dark Time. 
At the time that I read Loren's weblog,...
</description>
<link>http://weblog.burningbird.net/archives/000472.php</link>
<dc:subject>Virtual Neighborhood</dc:subject>
<dc:creator>shelley</dc:creator>
<dc:date>2002-08-23T15:07:57-06:00</dc:date>
</item>

As you can see, there really aren't that many core elements within the RSS specification, as the Working Group decided to keep the specification simple and allow additions through the use of modules, discussed next.