11.2 Case Study: mod_Book

My wife and I are currently planning on moving from London, England, to Sweden. To that effect, much of the contents of our home is already in storage, and most of this is books. We sent 86 tea chests full of books to the warehouse, and we still have plenty more to go.

Many people really like our books, many people like to borrow them, and for many reasons it would be quite cool to be able to put the details of books we have into an RSS feed. When we unpack the books, we will most likely scan their barcodes and order our library (we're geeky like that), so we will have all sorts of data available.

So, the challenge is to design an RSS module for both 1.0 and 2.0 that can deal with books.

11.2.1 What Do We Know?

The first thing to think about is precisely what knowledge we already have about the thing we are trying to describe. With books, we know a great deal:

  • The title

  • The author

  • The publisher

  • The ISBN number

  • The subject

  • The date of publication

  • The content itself

There are also, alas, things that we might think we know, but which we in fact do not. In the case of books, unless we are dealing with a specific edition in a specific place at a specific time, we do not know the number of pages, the price, the printer, the paper quality, or how critics received it. We might think we do?after all, I bought most of these books, and I can touch them and pick them up?but for the sake of sharable data these are not universally useful values. They will change with time and are not internationally sharable. Remember that once it has left your machine, the data you create?in this case each item?is lost to you. As the first author, it is your responsibility to create it in such a way that it retains its value for as long as possible with as wide an audience as possible.

So, rule 1 of module design is: decide what data you know, and what data you do not know.

11.2.2 Can We Express This Data Already?

Rule 2 of module design is: if possible, use another module's element to deliver the same information.

This is another key point. It is much less work to leverage the efforts of others, and when many people have spent time introducing Dublin Core support to desktop readers, for example, we should reward them by using Dublin Core as much as possible. Module elements need to be created only if there is no suitable alternative already in the wild.

So, to reexamine our data:

The title

Titles can be written within the core title element of either 1.0 or 2.0, or within the dc:title element of the Dublin Core module. One should always strive to use the core namespace first, so title it is.

The author

Here we have the first core split between 1.0 and 2.0. In 2.0, we can use the core author element. There is no such thing in 1.0, so we are forced to use the dc:creator element of Dublin Core. Because one should always strive to use the core namespace first, RSS 2.0 users should use author. But because we want to have as simple a module specification as possible, we might like to use the same element in both module versions. One way of doing this would be to import the RSS 2.0 namespace into the 1.0 feed and use author in both. However, this cannot be done. RSS 2.0's root namespace is "". We can't import that, as we don't have a namespace URI to point to. We could possibly use the URL of the 2.0 specification document as the URI, declare xmlns:rss2="http://backend.userland.com/rss", and then use rss2:author, but because the URI is different, technically this does not refer to the same vocabulary as the one used in RSS 2.0. As we will see, using the same element?even if it is in a slightly different syntax?is very useful indeed for the authors of RSS applications. So, for the sake of simplicity, I'm opting for dc:creator. We also have the option of using dc:contributor to denote a contributor.

The publisher

Publishers are lovely people and happily have their very own Dublin Core element, dc:publisher.

The ISBN number

ISBN numbers are fantastically useful here. Because the ISBN governing body ensures that each ISBN number is unique to its book, this can serve as a globally unique identifier. What's more, we can even turn an ISBN into a URI by using the format urn:isbn:0123456789. For RSS 1.0, this will prove remarkably useful, as we will discuss in a moment. Meanwhile, denoting the ISBN is a good idea. Let's invent a new element. Choosing book as the namespace prefix, let's call it book:isbn.

The subject

A book's subject can be a matter of debate ? especially with fiction ? so it may not be entirely sane to make this element mandatory or to trust it. Nevertheless, we do have ways of writing it. RSS 2.0's core element category may help here, as would dc:subject, especially when used with RSS 1.0 mod_taxonomy.

All of these schemes, however, rely on being able to place the subject within a greater hierarchy. Fortunately, library scientists are hard at work on this, and there are many to choose from. For our purposes, we will use the Open Directory hierarchy?just to provide continuity throughout this book.

The date of publication

Again, here we have a clash between the extended core of RSS 2.0 and RSS 1.0's use of Dublin Core. Within RSS 2.0 we have pubDate available, and within RSS 1.0 we rely on dc:date. Given that Dublin Core is more widely recognized within the RDF world and perfectly valid within the RSS 2.0 world, it saves time and effort to standardize on it. This is a good example of rule 3: as you cannot tell people what they can't do with your data, you must make it easy for them to do what they want.

The content itself

We have the content itself. The core description does not work here ? we're talking about the content, not a précis of it, and we certainly do not want to include all of the content, so content:encoded is out too. We really need an element to contain an excerpt of the book, the opening paragraph, for example.

Hurrah! We can invent a new element! Let's call it book:openingPara .

So, out of all the information we want to include, we need to invent only two new elements: book:isbn and book:openingPara. This is not a bad thing: modules do not just consist of masses of new elements slung out into the public. They should also include guidance as to the proper usage of existing modules in the new context. Reuse and recycle as much as possible.

To summarize, we now have:

<title/>
<dc:author/>
<dc:publisher/>
<book:isbn/>
<dc:subject/>
<dc:date/>
<book:openingPara/>

11.2.3 Putting the New Elements to Work with RSS 2.0

Before creating the feed item, we need to decide on what the link will point to. Given that my book collection is not web-addressable in that way, I'm going to point people to the relevant page on http://isbn.nu?Glenn Fleishman's book-price comparison site.

For an RSS 2.0 item, we can therefore use Example 11-1.

Example 11-1. mod_Book for RSS 2.0
<item>
  <title>Down and Out in the Magic Kingdom</title>
  <link>http://isbn.nu/0765304368/</link>
  <dc:author>Cory Doctorow</dc:author>
  <dc:publisher>Tor Books</dc:publisher>
  <book:isbn>0765304368</book:isbn>
  <dc:subject>Fiction</dc:subject>
  <dc:date>2003-02-01T00:01+00:00</dc:date>
  <book:openingPara> I lived long enough to see the cure for death; to see the rise of the 
Bitchun Society, to learn ten languages; to compose three symphonies; to realize my 
boyhood dream of taking up residence in Disney World; to see the death of the workplace 
and of work.</book:openingPara>
</item>

As you can see in this simple strand of RSS 2.0, the inclusion of book metadata is easy. We know all about the book, and a mod_Book-compatible reader can allow us to read the first paragraph and, if it appeals, to click on the link and buy it. All is good.

11.2.4 Putting the New Elements to Work with RSS 1.0

With RSS 1.0, we must make a few changes. First, we need to assign the book a URI for the rdf:about attribute of item. This is not as straightforward as you might think. We need to think about precisely what we are describing. In this case, the choice is between a specific book?the one that is sitting on my desk right now?and the concept of that book, of which my specific book is one example.

The URI determines this. If I make the URI http://www.benhammersley.com/myLibrary/catalogue/0765304368, then the item refers to my own copy: one discreet object.

If, however, I make the URI urn:isbn:0765304368, then the item refers to the general concept of Cory Doctorow's book. For our purposes here, this is the one to go for. If I were producing an RSS feed for a lending library, it might be different. Example 11-2 makes these changes to mod_Book in RSS 1.0.

Example 11-2. mod_Book in RSS 1.0
<item rdf:about="urn:isbn:0765304368">
  <title>Down and Out in the Magic Kingdom</title>
  <link>http://isbn.nu/0765304368/</link>
  <dc:author>Cory Doctorow</dc:author>
  <dc:publisher>Tor Books</dc:publisher>
  <book:isbn>0765304368</book:isbn>
  <dc:subject>Fiction</dc:subject>
  <dc:date>2003-02-01T00:01+00:00</dc:date>
  <book:openingPara> I lived long enough to see the cure for death; to see the rise of the       
Bitchun Society, to learn ten languages; to compose three symphonies; to realize my 
boyhood dream of taking up residence in Disney World; to see the death of the workplace 
and of work.</book:openingPara>
</item>

The second thing to think about is the preference for all the element values within RSS 1.0 to be rdf:resources and not literal strings. To this end, we need to assign URIs to each of the values we can. It is possible within RSS 1.0 to keep extending all the information you have to greater and greater detail. At this point, you must think about your audience. If you foresee people using the feed for only the simplest of tasks?such as displaying the list in a reader or on a site?then you can stop now. If you foresee people using the data in deeper, more interesting applications, then you need to give guidance as to how far each element should be extended.

For the purposes of this chapter, we need to go no further, but for an example we will anyway. Example 11-3 expands the dc:author element via RDF and the use of a new RDF vocabulary?FOAF, or Friend of a Friend (see http://www.rdfweb.org).

Example 11-3. Expanding the module even further
<?xml version="1.0"?>
<rdf:RDF 
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:book="http://www.exampleurl.com/namespaces"
xmlns="http://purl.org/rss/1.0/"
>
   
<item rdf:about="urn:isbn:0765304368">
  <title>Down and Out in the Magic Kingdom</title>
  <link>http://isbn.nu/0765304368/</link>
  <dc:author rdf:resource="mailto:doctorow@craphound.com" />
  <dc:publisher>Tor Books</dc:publisher>
  <book:isbn>0765304368</book:isbn>
  <dc:subject>Fiction</dc:subject>
  <dc:date>2003-02-01T00:01+00:00</dc:date>
  <book:openingPara> I lived long enough to see the cure for death; to see the rise of the       
Bitchun Society, to learn ten languages; to compose three symphonies; to realize my 
boyhood dream of taking up residence in Disney World; to see the death of the workplace 
and of work.</book:openingPara>
</item>
   
<dc:author rdf:about="mailto:doctorow@craphound.com">
 <foaf:Person>
   <foaf:name>Cory Doctorow</foaf:name>
   <foaf:title>Mr</foaf:title>
   <foaf:firstName>Cory</foaf:firstName>
   <foaf:surname>Doctorow</foaf:surname>
   <foaf:homepage rdf:resource="http://www.craphound.com"/>
   <foaf:workPlaceHomepage rdf:resource="http://www.eff.org/" />
 </foaf:Person>
</dc:author>
   
</rdf:RDF>

Because only you, as the module designer, know the scope of the data you want to put across, you must document your module accordingly. Speaking of which . . .

11.2.5 Documentation

You must document your module. It is obligatory. The place to do this is at the address you are using as the namespace URI. Without documentation, no one will know precisely what you mean, and no one will be able to support your module. Without support, the module is worthless on the wider stage.