7.1 Module Status

According to the specification, "Modules are classified as Proposed until accepted as Standard by members of the RSS-DEV working group or a sub-membership thereof focused on the area addressed by the module."

Currently, there are only 3 modules classified as Standard? Dublin Core, Syndication, and Content?and at least 16 that are Proposed. The Proposed classifications, however, should not stop you from using the modules ? it indicates only the lack of a schedule for voting on the modules, not a lack of merit. These modules may well be accepted as Standard in the future. So, to reflect this, here are the current modules, in alphabetical order.

mod_admin

The Administration module, written by Aaron Swartz and Ken Macleod, provides information on the feed's owner and the toolkit used to produce it. This helps the RSS user work with his provider to get things right, and it helps the RSS community at large to identify problems with certain systems.

Recommended Usage

It is good manners to include this module as a matter of course. The data is not dynamically created, so it can be included within a template and just left to do its job.

Namespace

The namespace prefix for this module is admin:, which should point to http://webns.net/mvcb/. Therefore, the root element and the RSS 1.0 module containing mod_admin should look like this:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
         xmlns="http://purl.org/rss/1.0/"
         xmlns:admin="http://webns.net/mvcb/">

Elements

The mod_admin elements occur as subelements of channel only. They consist of:

<admin:errorReportsTo rdf:resource= "URI"/>: The URI is typically a mailto: URL for contacting the feed administrator to report technical errors.
<admin:generatorAgent rdf:resource= "URI"/>: The URI is the home page of the software used to generate the feed. If possible, this should be a page that specifies a version number within the URI.

Example

Example 7-1. mod_admin in the channel element

<?xml version="1.0" encoding="utf-8"?> 
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
         xmlns="http://purl.org/rss/1.0/"
         xmlns:admin="http://webns.net/mvcb/"> 
  <channel rdf:about="http://rss.benhammersley.com/index.rdf">
    <title>Content Syndication with RSS</title>
    <link>http://rss.benhammersley.com</link>
    <description>Content Syndication with RSS, the blog</description>
    <admin:errorReportsTo rdf:resource="mailto:ben@benhammersley.com"/>
    <admin:generatorAgent rdf:resource="http://www.movabletype.org/?v=2.1"/>
...

mod_aggregation

The Aggregation module plays a small but useful part in the life cycle of information passing through the Web. It allows news aggregators, such as Meerkat, Snewp, and so on (all covered in Chapter 12) to display the sources of their items. These services gather items from many other sources and group them by subject. mod_aggregation allows us to know where they originated.

This, of course, works over generations: as long as the mod_aggregation elements are respected, a Meerkat feed that uses a Snewp item from a Moreover feed that is itself an aggregation (for example) will still have the original source credited. As long as the mod_aggregation elements are left in place, the information is preserved. There is not, as yet, any feature for describing an aggregation history, however. You only know about the primary source.

Aggregators are the only people generating these elements ? if you're building such a system, consider including them. The act of parsing such elements, however, is good for everyone. One can easily envisage an HTML representation of an RSS 1.0 feed with a "link via x" section. This is already done manually by many weblog owners, so why not include the feature in your RSS parsing scripts?

Namespace

mod_aggregation takes ag: as its prefix and http://purl.org/rss/modules/aggregation as its identifying URI. Therefore, an RSS 1.0 root element that uses it should look like this:

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
         xmlns="http://purl.org/rss/1.0/"
         xmlns:ag="http://purl.org/rss/1.0/modules/aggregation/" >

Elements

mod_aggregation 's elements are all subelements of item. There are three, and they are all mandatory if you are using the module:

ag:source: The name of the source of the item (no character limit).
ag:sourceURL: The URL of the source of the item (no character limit).
ag:timestamp: The time the item was published by the original source, in the ISO 8601 standard (ccyy-mm-ddThh:mm:ss+hh:mm).

Example

Example 7-2. mod_aggregation in action

<?xml version="1.0"?> 
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
         xmlns="http://purl.org/rss/1.0/"
         xmlns:ag="http://purl.org/rss/1.0/modules/aggregation/"
>   
   
  <channel rdf:about="http://meerkat.oreillynet.com/?_fl=rss1.0">
    <title>Meerkat</title>
    <link>http://meerkat.oreillynet.com</link>
    <description>Meerkat: An Open Wire Service</description>
  </channel>
   
  <items>
    <rdf:Seq>
      <rdf:li rdf:resource="http://c.moreover.com/click/here.pl?r123" />
    </rdf:Seq>
  </items>
   
  <item rdf:about="http://c.moreover.com/click/here.pl?r123" >
    <title>XML: A Disruptive Technology</title>
<link>http://c.moreover.com/click/here.pl?r123</link>
    <description>
    XML is placing increasingly heavy loads on the existing technical
    infrastructure of the Internet.
    </description>
    <ag:source>XML.com</ag:source>
    <ag:sourceURL>http://www.xml.com</ag:sourceURL>
    <ag:timestamp>2000-01-01T12:00+00:00</ag:timestamp>
  </item>
</rdf:RDF>

mod_annotation

mod_annotation is the smallest module. It consists of one element, which refers to a URL where a discussion of the item is being held. It might point to a discussion group, a commenting service, Usenet, an Annotea service, etc.

For sites that host such discussions, the addition of this module into the RSS feed should be simple and worthwhile. Weblogs, for example, might only need to point the element to the URL of the main entry page for a particular item.

If you want to parse this module into HTML, you should, as with many of these modules, have no problems simply assigning a separate div or span for the contents of the element, wrapping it within an <a href="URL">, and formatting it as you wish. This would probably only make sense if your parser is also taking notice of either the description element or the data provided by mod_content, simply because it is hard to have a discussion based solely on a headline.

Namespace

mod_annotation is identified by the namespace prefix annotate: and the URI http://purl.org/rss/1.0/modules/annotate/. Hence, the root element looks like this:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns="http://purl.org/rss/1.0/"
         xmlns:annotate="http://purl.org/rss/1.0/modules/annotate/"
>

Element

There's only one element, a subelement of item , and here it is:

<annotate:reference rdf:resource= "URL" />: The URL points to a discussion on the item.

However, this element can also take subelements of its own from the Dublin Core modules, mod_dublincore and mod_DCTerms. We'll cover these modules soon, but Example 7-4 will give you an idea.

Do you see how the namespaces system works? In Example 7-3, we have a feed using only the mod_annotation system. We've added one additional namespace and used the element correctly. In Example 7-4, we want to use another module to describe something in terms that the currently available elements cannot. So we decide upon mod_dublincore, add in the namespace declaration, and go ahead.

Also notice that in Example 7-3 annotate is a one-line element, with a closing />, whereas in Example 7-4 annotate contains the mod_dublincore elements before closing. This means that the mod_dublincore elements refer to annotate, not to the item or channel. As we'll see, mod_dublincore can get addictive, and you might find yourself describing everything in your feed. This is not bad at all, but it may get confusing. By paying attention to which elements are within which, you can see what is happening.

Examples

Example 7-3. mod_annotation with additional mod_dublincore data

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns="http://purl.org/rss/1.0/"
         xmlns:annotate="http://purl.org/rss/1.0/modules/annotate/"
> 
   
<item rdf:about="http://www.example.com/item1">
    <title>RSS 0.9 or RSS 1.0...Discuss</title>
    <link>http://www.example.com/item1</link>
    <annotate:reference rdf:resource="http://www.example.com/discuss/item1"/>
</item>

Example 7-4. mod_annotation inside an item element

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns="http://purl.org/rss/1.0/"
         xmlns:annotate="http://purl.org/rss/1.0/modules/annotate/"
         xmlns:dc="http://purl.org/dc/elements/1.1/"
>
   
. . .
   
   
<item rdf:about="http://www.example.com/item1">
    <title>RSS 0.9 or RSS 1.0...Discuss</title>
    <link>http://www.example.com/item1</link>
<annotate:reference rdf:resource="http://www.example.com/discuss/item1">
      <dc:subject>XML</dc:subject>
  <dc:description>A discussion group on the subject in hand</dc:description>
</annotate>
</item>

mod_audio

mod_audio is the first of the RSS 1.0 modules we have seen that points at something other than a text page. It is specifically designed for the syndication of MP3 files ? its elements matching those of the ID3 tag standard ? but it can be used for any audio format.

It was designed by Brian Aker, who also wrote the mp3 module for the Apache web server. That Apache module not only streams MP3s from a server, but also creates RSS playlists.

If you're syndicating audio, or pointing at feeds that are syndicating audio, this is a must. Also, consider using mod_streaming , the module for streaming.

Namespace

mod_audio uses the prefix audio: and is indentified by the URI http://media.tangent.org/rss/1.0/. Hence:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
          xmlns="http://purl.org/rss/1.0/"
          xmlns:audio="http://media.tangent.org/rss/1.0/" >

Elements

mod_audio elements are all subelements of item. None of them are mandatory to the module, but you should make an effort to include as many as possible per track.

audio:songname: The title of the song.
audio:artist: The name of the artist.
audio:album: The name of the album.
audio:year: The year of the track.
audio:comment: Any text comment on the track.
audio:genre: The genre of the track (should match genre_id).
audio:recording_time: The length of the track in seconds.
audio:bitrate: The bitrate of the track, in kbps.
audio:track: The number of the track on the album.
audio:genre_id: The genre ID number, as defined by the ID3 standard.
audio:price: The price of the track, if you're selling it.

Example

Example 7-5. An item using mod_audio

<item rdf:about="http://www.example.com/boyband.mp3" >
     <title>BoyBand's Latest Track!</title>
     <description>The latest track from the fab five.</description>
     <link>http://www.example.com/boyband.mp3</link>
     <audio:songname>One Likes to Get Funky</audio:songname>
     <audio:artist>BoyBand</audio:artist>
     <audio:album>Not Just Another</audio:album>
     <audio:year>2005</audio:year>
     <audio:genre>Top 40</audio:genre>
     <audio:genre_id>60</audio:genre_id>
</item>

Applications

It could be said that some of these elements are superfluous, since they can be replaced by other elements (for example, audio:songname could be replaced by title). This is true in many cases, but it is much neater to use a simple MP3 tag-reading script to generate the RSS and map ID3 elements across directly. There are many ID3 tag-reading libraries available, including Chris Nandor's MP3::Info for Perl.

mod_changedpage

mod_changedpage does for RSS 1.0 what the cloud element does for RSS 0.9x ? it introduces a form of Publish and Subscribe. We'll discuss Publish and Subscribe in detail in Chapter 12, but basically it enables a system in which you can "subscribe" to a feed and be notified when something new is published.

mod_changedpage uses only one element, which points to a changedPage server. Users wishing to be told when the feed has updated send an HTTP POST request of a certain format to this server. Upon updating, this server sends a similar POST request back to the user. The user's client then knows about the update. Again, Chapter 12 examines this in detail.

Namespace

mod_changedpage takes the namespace prefix cp: and is identified by the URI http://my.theinfo.org/changed/1.0/rss/. Hence, its declaration looks like this:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
         xmlns="http://purl.org/rss/1.0/"  
         xmlns:cp="http://my.theinfo.org/changed/1.0/rss/">

Element

mod_changedpage takes only one element, a subelement of channel :

<cp:server rdf:resource="URL" />: The URL is the address of the changedPage server.

Example

Example 7-6. mod_changedpage in the channel

<?xml version="1.0" encoding="utf-8"?> 
   
<rdf:RDF  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
        xmlns=http://purl.org/rss/1.0/
          xmlns:cp="http://my.theinfo.org/changed/1.0/rss/"
>
   
<channel rdf:about="http://meerkat.oreillynet.com/?_fl=rss1.0">
  <title>Meerkat</title>
  <link>http://meerkat.oreillynet.com</link>
  <description>Meerkat: An Open Wire Service</description>
<cp:server rdf:resource="http://example.org/changedPage" />
</channel>
...

mod_company

mod_company allows RSS feeds to deliver business news metadata. Like mod_audio, this is another example of RSS 1.0 stretching the bounds of RSS functionality; this module could lead to RSS being used as a specialist business news vehicle rather than just a generalized list of links.

Namespace

mod_company takes the namespace prefix company: and is identified by the URI http://purl.org/rss/1.0/modules/company. By now you'll realize that this means the root element of a RSS 1.0 document containing mod_company will resemble this:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
          xmlns="http://purl.org/rss/1.0/"
          xmlns:company="http://purl.org/rss/1.0/modules/company/">

Elements

mod_company provides four elements, all of which are subelements of item. None of them are defined as mandatory, but there's little hassle and much reward in including all of them.

company:name: The name of the company.
company:symbol: The ticker symbol of the company's stock.
company:market: The abbreviation of the market in which the stock is traded.
company:category: The category of the company, expressed using the Taxonomy module. For more details, see mod_taxonomy later in this chapter.

Example

Example 7-7. mod_company being used within an item

<item rdf:about="http://www.example.com/financial_news/00001.html">
     <title>Cisco Stock moves either up or down!</title>
<description>A brief story about a thing happening today</description>
<link>http://www.example.com/financial_news/00001.html<link>
    <company:symbol>CSCO</company:symbol>
     <company:market>NASDAQ</company:market>
     <company:name>Cisco Systems Inc.</company:name>
     <company:category>
     <taxo:topic rdf:resource="http://dmoz.org/Computers/Data_Communications/Vendors/
Manufacturers/">
     </company:category>
</item>

mod_content

mod_content is perhaps the most misunderstood module of all. Its purpose is not only to allow for much richer content ? the entire site, images and all, for example ? to be included within a RSS 1.0 item, but also to give a complete RDF description of this content. Now, not only can we make RDF graphs from channel to item, but we can also make them from item to an image within an item. An RDF query of "Find all the feeds that point to articles accompanied by a picture of an elephant" can now be executed easily, as mod_content provides not just the content itself, but the relationship metadata as well. It can also be used to split the object to which an item points into smaller sections, from the standpoint of an RDF parser.

The syntax for this can look a little long-winded ? RDF is rather verbose when written in XML ? and, because of this, mod_content feeds can often look scary. They're not really, and reformatting them in a text editor can give you an idea of what is happening. Despite this apparent complexity, it is one of the only modules to have been officially accepted by the rss-dev working group.

It must be noted that mod_content is not to be confused with the core specification's description subelement of item. Some RSS 1.0 feeds use description to contain the content the item represents. While this may be common practice with RSS 0.9x users, RSS 1.0 users may wish to do it properly. description is for a description of the content; mod_content is for the content itself.

Namespaces

mod_content is identified by the namespace prefix content: and the URI http://purl.org/rss/1.0/modules/content/. Hence, the root element looks like this:

<rdf:RDF  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
          xmlns="http://purl.org/rss/1.0/"
          xmlns:content="http://purl.org/rss/1.0/modules/content/">

Elements

mod_content is slightly more complex than the other modules?it has a specific structure that must be followed. It consists of one element with various subelements that have important attributes of their own, some of which are mandatory, while others are not.

The first element, content:items, is a subelement of item . It consists of an rdf:Bag that contains as many content:items as needed, each enveloped in an rdf:li element, as shown in Example 7-8.

Example 7-8. The basic structure of a mod_content items

<item>
...
<content:items>
<rdf:Bag>
  <rdf:li>
      <content:item rdf:about=""/>
  </rdf:li>
  <rdf:li>
      <content:item />
  </rdf:li>
</rdf:Bag>
</content:items>
</item>

Notice that one of the content:item elements in Example 7-8 has an rdf:about attribute, but the other does not. This difference is to show that if the content is available on the Web at a specific address, the rdf:about attribute contains the URI of the content, including any part of the content that is directly addressable (an image, for example). Hence, a deeper level of RDF relationship is declared.

Now, you will also notice that the content:item element in Example 7-8 is empty. This is not much use, so we'll look into filling it. Content, as you know, can come in many formats: plain text, HTML 4.0, XHTML 1.1, and so on. What you do with such content depends on its format, so mod_content needs to be able to describe the format. It does this with a content:format subelement.

This element takes one attribute, rdf:resource, which points to a URI that represents the format of the content. Basically, this attribute declares the namespace of the content. For example, for XHTML 1.0 Strict, the URI is http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict. The URI for HTML 4.0 is http://www.w3.org/TR/html4/. Further examples can be found in the RDDL natures document at http://www.rddl.org/natures/.

The content:format element is required. If you don't include it, you force anyone parsing your feed to guess your content's format.

Because you have declared the format of the content:item using an RDF declaration, you must now envelop the actual content inside an rdf:value element. Example 7-9 shows a simple version.

Example 7-9. A simple version of a mod_content item

<item>
...
<content:items>
  <rdf:Bag>
    <rdf:li>
      <content:item>
      <content:format rdf:resource="http://www.w3.org/TR/html4/" />
        <rdf:value>
          <![CDATA[<em>This is<strong>very</em> cool</strong>.]]>
        </rdf:value>
      </content:item>
</rdf:li>
</rdf:Bag>
</content:items>

Example 7-9 shows a single item containing a single content:item, containing a line of HTML 4.0 that reads This isvery cool. Note that the HTML content is encased in a CDATA section. As with all XML (see Appendix A for details), non-XML-compliant content must be wrapped away in this manner inside an RSS feed.

HTML, however, is not the only content type, and newer content types are fully XML-compliant. XHTML, for example, does not need to be wrapped away, as long as the parser is made aware that the contents of the rdf:value element can be treated accordingly. For this, we use rdf:value's optional range of attributes, rdf:parseType and xmlns. Example 7-10 shows the same content as Example 7-9, but reformatted into XHTML. Note the differences in bold.

Example 7-10. A simple version of a mod_content item, with XHTML

<item>
...
<content:items>
  <rdf:Bag>
    <rdf:li>
      <content:item>
      <content:format rdf:resource="http://www.w3.org/1999/xhtml"/>
       <content:encoding rdf:resource="http://www.w3.org/TR/REC-xml#dt-wellformed" />
        <rdf:value rdf:parseType="Literal" xmlns="http://www.w3.org/1999/xhtml">
           <em>This is <strong>very</strong> </em> <strong>cool</strong>.
        </rdf:value>
      </content:item>
</rdf:li>
</rdf:Bag>
</content:items>

In Example 7-10, we've told the rdf:value element that its contents are both parsable of the namespace represented by the URI http://www.w3.org/1999/xhtml. We declare all of this to prevent RDF parsers from getting confused. We humans, of course, are anything but.

The content itself is now well-formed XML. To show this, we can include a new subelement of content:item , the optional content:encoding. This points to the rdf:resource of the URI of well-formed XML, http://www.w3.org/TR/REC-xml#dt-wellformed.

If no content:encoding is present, we assume that the content is plain character data, either enclosed in a CDATA section or surrounded by escaped characters such as .

In summary:

content:items: Contains a subelement of rdf:Bag.
rdf:Bag: Contains one or more subelements of rdf:li.
rdf:li: Contains a mandatory subelement of content:item.
content:item: Takes the mandatory subelements content:format and rdf:value, and the optional subelement content:encoding. content:item must take the attribute rdf:about="URI" if the object can be directly addressed.
content:format: Takes the attribute rdf:about="URI", where the URI represents the format of the the content.
rdf:value: Contains the actual content. It can take two attributes. If its content is well-formed XML, it must take the attributes rdf:Parsetype="literal" and xmlns="http://www.w3.org/1999/xhtml".
content:encoding: Takes the attribute rdf:about="URI", where the URI represents the format in which the content is encoded.

Examples

Example 7-11. A fully mod_contented item

<item rdf:about="http://example.org/item/">
 <title>The Example Item</title>
 <link>http://example.org/item/</link>
 <description>I am an example item</description>
 <content:items>
  <rdf:Bag>
   
   <rdf:li>
    <content:item>
    <content:format rdf:resource="http://www.w3.org/1999/xhtml" />
    <content:encoding rdf:resource="http://www.w3.org/TR/REC-xml#dt-wellformed" />
     <rdf:value rdf:parseType="Literal" xmlns="http://www.w3.org/1999/xhtml">
      <em>This is a <strong>very cool</strong> example of mod_content</em>
     </rdf:value>
    </content:item>
   </rdf:li>
   
   <rdf:li>
    <content:item>
     <content:format rdf:resource="http://www.w3.org/TR/html4/" />
     <rdf:value>
      <![CDATA[You can include content in lots of formats. <a                
       href="http://www.oreillynet.com">links</a> too. ]]>
     </rdf:value>
    </content:item>
   </rdf:li>
   
  </rdf:Bag>
 </content:items>
</item>

It may either amuse or terrify you to realize that as content:item can contain any XML-formatted content, it can itself contain other RSS feeds. This might be of use for a RSS tutorial website, syndicating its lessons. Here, in Example 7-12, is an early version of this very section of this book, represented as an item, stopping right here to prevent a spiral of recursion.

Example 7-12. This page, formatted into an RSS 1.0 item

<item rdf:about="http://example.org/item/">
	<title>Examples</title>
	<description>The text of the first part of the Examples section of the mod_content 
	bit of chapter 7 of Content Syndication with XML and RSS</description>
	   
	<content:items>
	<rdf:Bag>
	   
	<rdf:li>
	<content:item>
	<content:format rdf:resource="http://www.w3.org/TR/html4/" />
	<rdf:value>
	<![CDATA[ <h2>Examples</h2>]]>
	</rdf:value>
	</content:item>
	</rdf:li>
	   
	<rdf:li>
	<content:item>
	<content:format rdf:resource="http://purl.org/rss/1.0/" />
	<content:encoding rdf:resource="http://www.w3.org/TR/REC-xml#dt-wellformed" />
	<rdf:value rdf:parseType="Literal" 
		   xmlns="http://purl.org/rss/1.0/"
		   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
		   xmlns:content="http://purl.org/rss/1.0/modules/content/">
	<item rdf:about="http://example.org/item/"><title>The Example Item</title>
	<link>http://example.org/item/</link><description>I am an example item</description>
	<content:items><rdf:Bag><rdf:li><content:item><content:format rdf:resource="http://www.w3.org/1999/xhtml" /><
	content:encoding rdf:resource="http://www.w3.org/TR/REC-xml#dt-wellformed" /><
	rdf:value rdf:parseType="Literal" xmlns="http://www.w3.org/1999/xhtml"><em>
	This is a <strong>very cool</strong> example of mod_content</em></rdf:value></content:item>
	</rdf:li><rdf:li><content:item><content:format rdf:resource="http://www.w3.org/TR/html4/" /><rdf:value><![CDATA[You can include content in lots of formats. <a href="http://www.oreillynet.com">links</a> too. ]]>
	</rdf:value></content:item></rdf:li></rdf:Bag></content:items></item>
	</rdf:value>
	</content:item>
	</rdf:li>
	   
	<rdf:li>
	<content:item>
	<content:format rdf:resource="http://www.w3.org/TR/html4/" />
	<rdf:value><![CDATA[ <p><i> Example 7.12 A fully mod_contented &lt;item&gt;</i></p><p> 
	It may either amuse or terrify you to realize that 
as &lt;content:item&gt; can 
contain any XML-formatted content, it can itself contain other RSS feeds. 
This might be of use 
for a RSS tutorial website, syndicating its lessons. Here, in example 7.13, is this 
very section of this book, represented as an &lt;item&gt;, stopping right here to prevent a 
spiral of recursion.</p>]]>
</rdf:value>
</content:item>
</rdf:li>
</rdf:Bag>
</content:items>
</item>

mod_dublincore

The second of the Standard modules to be examined in this chapter, mod_dublincore is the most-used of all the RSS 1.0 modules. It allows an RSS 1.0 feed to express the additional metadata formalized by the Dublin Core Metadata Initiative. Chapter 5 discusses this initiative in detail, so let's move on to the details of the module itself.

Namespace

mod_dublincore is identified by the prefix dc: and the URI http://purl.org/dc/elements/1.1/. So, in the grand tradition, the root element appears as:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns="http://purl.org/rss/1.0/"
         xmlns:dc="http://purl.org/dc/elements/1.1/"
>

Elements

mod_dublincore can be used in two ways: the simpler and the more RDF-based.

In either usage, mod_dublincore elements are entirely optional and can be applied to the channel , an item, an image, a textinput element, or all of them, as liberally as you wish, as long as the information you are relating makes sense. It is rather addictive, I must say, and I encourage you to put Dublin Core metadata all over your feeds. Here's what we can include:

dc:title

The title of the item.

dc:creator

The name of the creator of the item (i.e., a person, organization, or system). If the creator is a person, this information is customarily in the format Firstname Lastname (email@domain.com).

dc:subject

The subject of the item.

dc:description

A brief description of the item.

dc:publisher

The name of the publisher, either a person or an organization. If the publisher is a person, this information is customarily in the format Firstname Lastname (email@domain.com).

dc:contributor

The name of a contributor, customarily in the format Firstname Lastname (email@domain.com).

dc:date

The publishing date, in the W3CDTF format (e.g., 2000-01-01T12:00+00:00).

dc:type

The nature of the item, taken from the list of Dublin Core types at http://dublincore.org/documents/dcmi-type-vocabulary/:

Collection: A collection is an aggregation of items, described as a group; its parts can be described and navigated separately (for example, a weblog).
Dataset: A dataset is information encoded in a defined structure (for example, lists, tables, and databases), intended to be useful for direct machine processing.
Event: According to the official definition of the Dublin Core authors, an event is a nonpersistent, time-based occurrence. Examples include any exhibition, webcast, conference, workshop, open-day, performance, battle, trial, wedding, tea-party, conflagration, or orgy. The soon-to-be-described mod_event has a lot to do with this sort of thing.
Image: They are worth a thousand words, you know.
Interactive resource: The official Dublin Core definition of an interactive resource is "a resource which requires interaction from the user to be understood, executed, or experienced. For example ? forms on web pages, applets, multimedia learning objects, chat services, virtual reality." In the RSS world, resrouces could be either pointers to programs, or the textinput element itself.
Service: Technically, a service is a system that provides one or more functions of value to the end user. Assuming that just providing information doesn't count, a service could be used to point to web applications or web services, as long as you create an RSS feed that provides the necessary details (using mod_content to syndicate WSDL files, for example).
Software: You know what software is. In this case, it is distinguished from an interactive resource by being downloadable, rather than run on a remote server.
Sound: Officially, a sound is a resource with content primarily intended to be rendered as audio.
Text: Plain text content.

dc:format

This differs from dc:type by a degree of sophistication. Whereas dc:type provides a top-level indication of the feed's nature, dc:format should point to the exact MIME type of the content itself.

dc:identifier

The identifier should be an unambiguous reference to the resource within a given context. So, in RSS 1.0 terms, this is the same as the item 's rdf:about attribute.

dc:source

In RSS 1.0 terms, this element can do the same job as the ag:sourceURL of the mod_aggregation module. It should point to an unambiguous reference of the source of the item. Unlike the ag:sourceURL element, however, dc:source is not restricted to URLs. Any sufficiently unambiguous reference works (ISBN numbers, for example).

dc:language

The language in which the item is written, using the standard language code, as covered in Appendix B.

dc:relation

The URI of a related resource. See mod_DCTerms later in this chapter for more details.

dc:coverage

According to the Dublin Core authors, "Coverage will typically include spatial location (a place name or geographic coordinates), temporal period (a period label, date, or date range), or jurisdiction (such as a named administrative entity). Recommended best practice is to select a value from a controlled vocabulary (for example, the Thesaurus of Geographic Names [TGN]) and that, where appropriate, named places or time periods be used in preference to numeric identifiers such as sets of coordinates or date ranges."

dc:rights

This element should contain any copyright, copyleft, public domain, or similar declaration. The absence of this element does not imply anything whatsoever.

The more complex version of mod_dublincore adds RDF and the mod_taxonomy module to give a richer meaning to dc:subject. For example, dc:subject can be used simply like this:

<dc:subject>World Cup</dc:subject>

or combined with a definition of a topic, in a richer RDF version:

<dc:subject>
  <rdf:Description>
    <taxo:topic rdf:resource="http://dmoz.org/Sports/Soccer/" />
    <rdf:value>World Cup</rdf:value>
  </rdf:Description>
</dc:subject>

This not only defines the subject, but also provides it with a wider contextual meaning. In this example, we're saying the subject is "the World Cup of soccer" (or more correctly, we're saying that "this item is on the subject represented by the term `World Cup' in the context provided by the URI http://dmoz.org/Sports/Soccer".) After all, there is more than one World Cup. This approach is a especially useful for describing homonyms, such as:

<dc:subject>
  <rdf:Description>
    <taxo:topic rdf:resource="http://dmoz.org/Business/Industries/
Food_and_Related_Products/Beverages/Soft_Drinks" />
    <rdf:value>Coke</rdf:value>
  </rdf:Description>
</dc:subject>

as opposed to:

<dc:subject>
  <rdf:Description>
    <taxo:topic rdf:resource="http://dmoz.org/Health/Addictions/Substance_Abuse/
Illegal_Drugs/" />
    <rdf:value>Coke</rdf:value>
  </rdf:Description>
</dc:subject>

Example

Example 7-13. An RSS 1.0 feed with mod_dublincore

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
  xmlns="http://purl.org/rss/1.0/"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
>   
   
<channel rdf:about="http://meerkat.oreillynet.com/?_fl=rss1.0">
  <title>Meerkat</title>
  <link>http://meerkat.oreillynet.com</link>
  <description>Meerkat: An Open Wire Service</description>
  <dc:publisher>The O'Reilly Network</dc:publisher>
  <dc:creator>Rael Dornfest (mailto:rael@oreilly.com)</dc:creator>
  <dc:rights>Copyright &#169; 2000 O'Reilly &amp; Associates, Inc.</dc:rights>
  <dc:date>2000-01-01T12:00+00:00</dc:date>
  <dc:type>Interactive Resource</dc:type>
  <image rdf:resource="http://meerkat.oreillynet.com/icons/meerkat-powered.jpg" />
  <textinput rdf:resource="http://meerkat.oreillynet.com" />
   
  <items>
    <rdf:Seq>
      <rdf:li resource="http://c.moreover.com/click/here.pl?r123" />
    </rdf:Seq>
  </items>
   
</channel>
   
<image rdf:about="http://meerkat.oreillynet.com/icons/meerkat-powered.jpg">
  <title>Meerkat Powered!</title>
  <url>http://meerkat.oreillynet.com/icons/meerkat-powered.jpg</url>
<link>http://meerkat.oreillynet.com</link>
  <dc:creator> Rael Dornfest (mailto:rael@oreilly.com)</dc:creator>
  <dc:type>image</dc:type>
</image>
   
<textinput rdf:about="http://meerkat.oreillynet.com">
  <title>Search Meerkat</title>
  <description>Search Meerkat's RSS Database...</description>
  <name>s</name>
  <link>http://meerkat.oreillynet.com/</link>
</textinput>
   
<item rdf:about="http://c.moreover.com/click/here.pl?r123">
  <title>XML: A Disruptive Technology</title>
  <link>http://c.moreover.com/click/here.pl?r123</link>
  <dc:description>This the description of the article</dc:description>
  <dc:publisher>The O'Reilly Network</dc:publisher>
  <dc:creator>Simon St.Laurent (mailto:simonstl@simonstl.com)</dc:creator>
  <dc:rights>Copyright &#169; 2000 O'Reilly &amp; Associates, Inc.</dc:rights>
  <dc:subject>XML</dc:subject>
</item>
</rdf:RDF>

mod_DCTerms

Once Dublin Core metadata has sunk its I-must-add-metadata-to-everything addictive nature into your very soul, you soon realize that the core terms are lacking in depth. For example, dc:relation means "is related to," but in what way? We don't know, unless we use mod_DCTerms.

mod_DCTerms introduces 28 new subelements to channel , item, image, and textinput, as appropriate. These subelements are related, within Dublin Core, to the core elements found within mod_dublincore, but mod_DCTerms does not express this relationship. For example, dcterms:created is actually a refinement of dc:date.

Namespace

mod_DCTerms takes the namespace prefix dcterms: and is identified by the URI http://purl.org/dc/terms/. So, the root element looks like this:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
         xmlns="http://purl.org/rss/1.0/"
         xmlns:dcterms="http://purl.org/dc/terms/"
>

Elements

You have a lot to choose from with this module. As we've said, the elements can be subelements of channel, item, image, or textinput. Apply liberally and with gusto.

dcterms:alternative

An alternative title for the item. For example:

<title>Programming Perl</title>
<dc:title>Programming Perl</dc:title>
<dcterms:alternative>The Camel Book</dcterms:alternative>

dcterms:created

The date the object was created, in W3CDTF standard (YYYY-MM-DDTHH:MM:SS).

dcterms:issued

The date the object was first made available. This should be used, for backward compatibility, with dc:date, and it should contain the same value. Again, the date must be in W3CDTF format.

dcterms:modified

The date the content of the object last changed, in W3CDTF format. This can sit inside channel, item, or both.

dcterms:extent

The size of the document referred to by the section of the feed in which the element appears, in bytes.

dcterms:medium

The HTTP Content-Type of the object to which the parent element refers. The HTTP Content-Type is made up of the MIME type, followed optionally by the character set, denoted by the string ;charset=. For example:

<dcterms:medium>text/html; charset=UTF-8</dcterms:medium>

Paired elements

Some of the mod_DCterms elements come paired together naturally. When we talk about two separate items, it is important to remember that the following paired elemets must work together:

dcterms:isVersionOf and dcterms:hasVersion

This pair of elements works together to point to different versions of an object. For example, you could use it to list versions in different languages or different formats. Their values should point to each other, should be URIs, and, for complete RDF compatibility, should be encased in an rdf:resource attribute. There is also nothing to stop you from providing further information about the version, via additional RDF markup, like so:

<dcterms:hasVersion rdf:resource="URI  OF RESOURCE">
<dc:title>TITLE OF OTHER VERSION</dc:title>
</dcterms:hasVersion>

dcterms:isReplacedBy and dcterms:replaces

Used to denote an item that points to a more recent version of the object in question. The syntax is the same as the dcterms:isVersionOf pair ? it takes an rdf:resource attribute that points to the URI of the object in question.

dcterms:isRequiredBy and dcterms:requires

Used to denote an object relationship in which, according to the Dublin Core specification, "the described resource requires the referenced resource to support its function, delivery, or coherence of content." As you might expect by now, this pair takes the attribute rdf:resource to denote the URI of the object to which you're pointing , and may be augmented by additional RDF.

dcterms:isPartOf and dcterms:hasPart

The mod_DCTerms elements have quite self-explanatory names, and this pair is no exception. It denotes objects that are subsections of other objects. It's the traditional syntax of an rdf:resource attribute, with the option of additional RDF within the element.

dcterms:isReferencedBy and dcterms:references

A pair in which one object refers to or cites the other. Its syntax is the usual drill?an rdf:resource attribute and some additional RDF if you're feeling generous.

dcterms:isFormatOf and dcterms:hasFormat

This final pair of elements denotes two objects that contain the same intellectual content but differ in format. For example, one object could be color PDF and the other could be a Word document. The syntax is the same as the other paired elements, but with the additional recommendation that you include dc:format, dc:language, or another element that helps the end user tell the difference between the two separate versions. Also bear in mind that URIs must be unique, so anyone using content negotiation on their server must give different URIs for each format, whether or not it is actually necessary.

Using DCSV values

There are three mod_DCTerms elements that take a special syntax to denote a timespan. This syntax, Dublin Core Structured Values (DCSV), represents complex values together in one simple string. It takes the following format (all the attributes are optional):

name=ASSOCIATED NAME; start=START TIME; end=END TIME; scheme=W3C-DTF;

dcterms:temporal

This element denotes any timespan of the item's subject matter. For example:

<dcterms:temporal>
name=World War 2; start=1939; end=1945; scheme=W3C-DTF;
</dcterms:temporal>

dcterms:valid

This denotes the timespan during which the item's contents is valid. For example:

<dcterms:valid>start=20030101; end=200300201;  scheme=W3C-DTF;</dcterms

Preface

Chapter 1. Introduction

Chapter 2. Content-Syndication Architecture

Chapter 3. The Main Standards

Chapter 4. RSS 0.91, 0.92, and 2.0 (Really Simple Syndication)

Chapter 5. Richer Metadata and RDF

Chapter 6. RSS 1.0 (RDF Site Summary)

Chapter 7. RSS 1.0 Modules

7.1 Module Status

Chapter 8. RSS 2.0 (Simply Extensible)

Chapter 9. Using Feeds

Chapter 10. Directories, Web Aggregators, and Desktop Readers

Chapter 11. Developing New Modules

Chapter 12. Publish and Subscribe

Appendix A. The XML You Need for RSS

Appendix B. Useful Sites and Software

Colophon