6.5 Integrating the Dublin Core

According to the mission statement, located at http://www.dublincore.org/:

The Dublin Core Metadata Initiative is an open forum engaged in the development of interoperable online metadata standards that support a broad range of purposes and business models. DCMI's activities include consensus-driven working groups, global workshops, conferences, standards liaison, and educational efforts to promote widespread acceptance of metadata standards and practices.

The Dublin Core's primary purpose is to discover a metadata model that can be used to describe resources intelligently so that this information can be used in more efficient and intelligent resource searches, knowledge systems, and so on.

At first, this description of Dublin Core may position it as a competitive specification to RDF, but in reality, they're highly compatible. Dublin Core is an effort to define the business data of the Web, so to speak. RDF, on the other hand, is a way of recording this metadata so that it can be merged with other metadata defined for other businesses, not just the business of the Web. In other words, RDF is the methodology, and Dublin Core is one business employing the RDF methodology.

Since Dublin Core is an effort to define business data, serializing that data need not be done with RDF. The Dublin Core project provides an RDF/XML version of the data that it has defined, true. But it also provides one in simple, basic XML and one in HTML. However, it is the RDF/XML version we're interested in and will focus on at this time.

6.5.1 An Overview of the Dublic Core MetaData Element Set

The Dublin Core MetaData Element set (Version 1.1, found at http://www.dublincore.org/documents/1999/07/02/dces/. consists of a core set of elements that comprise what is known as simple Dublin Core. These elements are:

title

A name given to the resource

creator

An entity responsible for making the content of the resource

subject

The topic of the content of the resource

description

An account of the content of the resource

public

An entity responsible for making the content available

contributor

An entity responsible for making contributions to the content of the resource

date

A date associated with an event in the life cycle of the resource

type

The nature or genre of the content of the resource

format

The physical or digital manifestation of the resource

identifier

An unambiguous reference to the resource within a given context

source

A reference to the resource from which the present resource is derived

language

A language of the intellectual content of the resource

relation

A reference to a related resource

coverage

The extent or scope of the content of the resource

rights

Information about rights held in and over the resource

Associated with the different entities is additional information, such as Language being derived from the two-character language code derived from the ISO 639 document (such as "EN" for English) and a date format for date (YYYY-MM-DD).

As you can see immediately, several DC elements could be used in place of PostCon elements. First, though, let's take a look at Dublin Core implemented as RDF/XML.

6.5.2 Dublin Core in RDF/XML

The Dublin Core vocabulary is one of the simplest, which is probably one reason it's so heavily used. The namespace for the elements is at:

http://purl.org/dc/elements/1.1/

If you go to this URL with your browser, you'll see an actual document, with a schema description for each element. The prefix usually given for the Dublin Core namespace within an RDF document is dc, which we'll use in this chapter.

I won't include the document here, nor will I discuss each element. However, some elements are of particular interest because they seem to map to a PostCon element. And if there's a way of reducing PostCon, we'll want to pursue it.

For instance, one element from PostCon that definitely looks to be in DC is title. The Dublin Core title is defined to be "a name given to the resource." Since our definition of title in PostCon is "resource's title," we have a match. Looking at the schema definition for the property we find:

<rdf:Property rdf:about="http://purl.org/dc/elements/1.1/title">
 <rdfs:labelxml:lang="en-US">Title</rdfs:label>
 <rdfs:commentxml:lang="en-US">A name given to the resource.</rdfs:comment> 
 <dc:descriptionxml:lang="en-US">Typically, a Title will be a name by which the  
resource is formally known.</dc:description>
  <rdfs:isDefinedByrdf:resource="http://purl.org/dc/elements/1.1/" />
  <dcterms:issued>1999-07-02</dcterms:issued>
 </rdf:Property>

There are some differences between this and the original PostCon title schema definition. For instance, the schema for the PostCon title listed the property's domains (that is, acceptable contexts for the property) to be the pstcn:Resource class (and indirectly to Movement, which is a subclass of pstcn:Resource). The DC doesn't list domains because it doesn't seek to limit what classes it can be used for, opening the door for us to use the property in PostCon.

Another difference is that DC is used directly to describe the property. Again, this won't adversely impact the use of title in PostCon. In fact, the additional information is helpful. Finally, there is another property assigned to a different namespace: dcterms:issued. Before we can determine whether this property will limit our use of title in PostCon, we'll have to take a closer look at this new schema.

For more on Dublin Core in RDF/XML, see the pending recommendation "Expressing Simple Dublin Core in RDF/XML," authored by Dave Beckett, Eric Miller, and Dan Brickley, and found at http://www.dublincore.org/documents/2001/11/28/dcmes-xml/.

6.5.3 Qualified Dublin Core

All of the Dublin Core metadata elements are properties within the context of RDF. Within an RDF graph, that means that all of them radiate out from a single resource. Again, this makes the vocabulary attractive to use because it is so simple and uncomplicated. However, there are basic limitations to how broadly one can stretch any one element to meet a specific use. And by stretching meanings at all, we lose some refinement.

Sure, we can group all dates together, but do we want to?

So, the Dublin Core Working Group set out to define a set of qualifiers that limit or modify the meaning of the DC elements. Additionally, the group determined that the qualifiers belonged in one of two different categories: qualifiers for element refinement and qualifiers for encoding schema.

Element refinement qualifiers restrict the scope of the element. For instance, there is the general concept of date and then there is creation date (from PostCon), modified date, and so on. Those vocabularies that want such refinements can use things such as modified date and creation date. However, vocabularies (or applications) that don't care about the refinement can ignore it and just treat the qualified elements as date.

Element refinement qualifiers are based on the business of the schema rather than its implementation. Encoding schema qualifiers, though, exist purely to help with parsing and interpretation of the data. Again, date can have many interpretations as to what type of date is being recorded. By using encoding schema qualifiers, there's no confusion about what to expect for data within a specific date field.

When looking at Dublin Core, we can see uses for several of the elements, but when we look at the qualified Dublin Core implemented in RDF/XML, we find a strong match for several PostCon classes and properties.

First, the namespace for the qualified Dublin Core Schema is at http://purl.org/dc/terms/. The namespace prefix for the qualified Dublin Core is usually dcterms.

The first property that attracts attention is created, a qualifier on the date property. The created definition is:

<rdf:Property rdf:about="http://purl.org/dc/terms/created">
  <rdfs:label>Created</rdfs:label>
  <rdfs:comment>Date of creation of the resource.</rdfs:comment>
  <rdfs:subPropertyOf rdf:resource = "http://purl.org/dc/elements/1.1/date" />
  <rdfs:isDefinedBy rdf:resource="http://purl.org/dc/terms/" />
</rdf:Property>

The thing to focus on is the comment Date of creation of the resource. This exactly matches the description for the pstcn:creationDate property in PostCon. In the last section, we weren't sure how to handle the dcterms:issued, but now we know it's nothing more than an issued date, a further qualification of the specification for the title property.

Another set of properties that seemed similar to PostCon elements is the DC Relation property and the qualified replacers: dcterms:isReplacedBy, dcterms:seeAlso, dcterms:references, and so on. They're not used to replace PostCon's related property (and associated Resource class) though because the DC properties have built-in semantics that don't encompass all of PostCon's related property semantics. However, PostCon's pstcn:dependencies and DC's qualifier dcterms:requires seem to be a good match.

After the first glance, both the original Dublin Core elements and the qualified element set seem to have good replacements, or additions, to the PostCon vocabulary. And since both are defined within RDF, it will be simple to use them together in RDF/XML documents.

6.5.4 Mixing Vocabularies

After the first glance at the Dublin Core simple elements, I decided to replace the PostCon attributes demonstrated in this chapter with matching DC elements. These include the following replacements:

pstcn:title

dc:title

pstcn:author

dc:creator

pstcn:owner

dc:publisher

pstcn:abstract

dcterms:abstract

pstcn:description

dc:description

pstcn:creationDate

dc:created

pstcn:date

dc:date

I also decided to add the format property, to provide the resource file type. Small changes, but they do reduce the size of the PostCon vocabulary, as well as allowing easier data sharing on these items.

To see how these two vocabularies work together, the RDF/XML for the sample monsters1.htm resource is provided in Example 6-6. The Dublin Core Schema namespaces are added to the top-level RDF element, and the dc and dcterms properties are used in place of the now-removed PostCon properties. In addition, both Relevancy and the Presentation resources have been added to complete the document.

Example 6-6. Mixing PostCon and DC vocabulary elements
<?xml version="1.0"?>
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:pstcn="http://burningbird.net/postcon/elements/1.0/"
  xmlns:dcterms="http://purl.org/dc/terms/"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xml:base="http://burningbird.net/articles/">

  <pstcn:Resource rdf:about="monsters1.htm">

<!--Resource biographical information-->
     <pstcn:bio rdf:parseType="Resource">
        <dc:title>Tale of Two Monsters: Legends</dc:title>
        <dcterms:abstract>
            When I think of "monsters" I think of the creatures of 
            legends and tales, from the books and movies, and 
            I think of the creatures that have entertained me for years.
        </dcterms:abstract>
        <dc:description>
            Part 1 of four-part series on cryptozoology, legends, 
            Nessie the Loch Ness Monster and the giant squid.
        </dc:description>
       <dc:created>1999-08-01T00:00:00-06:00</dc:created>
       <dc:creator>Shelley Powers</dc:creator>
       <dc:publisher>Burningbird Network</dc:publisher>
      </pstcn:bio>

<!--Resource's relevancy at time RDF/XML document was built-->
      <pstcn:relevancy rdf:parseType="Resource">
        <pstcn:currentStatus>Active</pstcn:currentStatus>
        <dcterms:valid>2003-12-01T00:00:00-06:00</dcterms:valid>
        <dc:subject>legends</dc:subject>
        <dc:subject>giant squid</dc:subject>
        <dc:subject>Loch Ness Monster</dc:subject>
        <dc:subject>Architeuthis Dux</dc:subject>
        <dc:subject>Nessie</dc:subject>
        <dcterms:isReferencedBy rdf:resource="http://www.pibburns.com/cryptozo.htm" />
        <dcterms:references rdf:resource="http://www.nrcc.utmb.edu/" />
      </pstcn:relevancy>

<!--Presentation/consumption information about resource-->
      <pstcn:presentation rdf:parseType="Resource">
         <dc:format>text/html</dc:format>
         <dcterms:conformsTo>XHTML 1.0 Strict</dcterms:conformsTo>
         <dcterms:conformsTo>CSS Validation</dcterms:conformsTo>
         <dcterms:requires>HTML User agent</dcterms:requires>
         <pstcn:requires rdf:parseType="Resource">
            <pstcn:type>stylesheet</pstcn:type>
            <rdf:value>http://burningbird.net/de.css</rdf:value>
         </pstcn:requires>
         <pstcn:requires rdf:parseType="Resource">
            <pstcn:type>logo</pstcn:type>
            <rdf:value>http://burningbird.net/mm/dynamicearth.jpg</rdf:value>
         </pstcn:requires>
      </pstcn:presentation>

<!--History of events of resource-->
     <pstcn:history>
       <rdf:Seq>
        <rdf:_1 rdf:resource="http://www.yasd.com/dynaearth/monsters1.htm" />
        <rdf:_2 rdf:resource="http://www.dynamicearth.com/articles/monsters1.htm" />
        <rdf:_3 rdf:resource="http://burningbird.net/articles/monsters1.htm" />
      </rdf:Seq>    
     </pstcn:history>

<!--Resources internal to PostCon that are related to resource-->
     <pstcn:related rdf:resource="monsters2.htm" />
     <pstcn:related rdf:resource="monsters3.htm" />
     <pstcn:related rdf:resource="monsters4.htm" />
  </pstcn:Resource>

<!--Related resources-->
  <pstcn:Resource rdf:about="monsters2.htm">
     <dc:title>Cryptozooloy</dc:title>
     <pstcn:reason>First in the Tale of Two Monsters series.</pstcn:reason>
  </pstcn:Resource>
  <pstcn:Resource rdf:about="monsters3.htm">
     <dc:title>A Tale of Two Monsterss: Architeuthis Dux (Giant Squid)</dc:title>
     <pstcn:reason>Second in the Tale of Two Monsters series.</pstcn:reason>
  </pstcn:Resource>
  <pstcn:Resource rdf:about="monsters4.htm">
     <dc:title>Nessie, the Loch Ness Monster </dc:title>
     <pstcn:reason>Fourth in the Tale of Two Monsters series.</pstcn:reason>
  </pstcn:Resource>

<!--Resource events-->
  <pstcn:Movement rdf:about="http://www.yasd.com/dynaearth/monsters1.htm">
      <pstcn:movementType>Add</pstcn:movementType>
      <pstcn:reason>New Article</pstcn:reason>
      <dc:date>1998-01-01T00:00:00-05:00</dc:date>
  </pstcn:Movement>
  <pstcn:Movement rdf:about="http://www.dynamicearth.com/articles/monsters1.htm">
      <pstcn:movementType>Move</pstcn:movementType>
      <pstcn:reason>Moved to separate dynamicearth.com domain</pstcn:reason>
      <dc:date>1999-10-31:T00:00:00-05:00</dc:date>
  </pstcn:Movement>
  <pstcn:Movement rdf:about="http://www.burningbird.net/articles/monsters1.htm">
     <pstcn:movementType>Move</pstcn:movementType>
     <pstcn:reason>Collapsed into Burningbird</pstcn:reason>
     <dc:date>2002-11-01</dc:date> 
  </pstcn:Movement>

</rdf:RDF>

Running this document through the RDF Validator generates the expected RDF graph and no error.

One thing that this exercise demonstrates is the need to keep a vocabulary small and then add to it. As you saw with Dublin Core, the group started with a small set of important elements and then extended this with a new set of qualifier elements. This is a good approach for you to follow with your vocabularies and is the approach that other groups such as the RSS Working Group (discussed in Chapter 13) used. Doing so, others are more likely to make use of your vocabulary, and it also decreases the chances for modification in the future. The complete RDF Schema for PostCon, after the Dublin Core elements have been identified, is actually quite small. It's shown in its entirety in Example 6-7.

Example 6-7. PostCon RDF Schema
<?xml version="1.0"?>
<rdf:RDF xml:lang="en"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">

<rdfs:Class rdf:about="http://burningbird.net/postcon/elements/1.0/Resource">
 <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/>
<rdfs:label xml:lang="en"> Web Resource</rdfs:label>
 <rdfs:comment xml:lang="en">
    Web resource managed with PostCon system
 </rdfs:comment>
 <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Resource" />
</rdfs:Class>

<rdfs:Class rdf:about="http://burningbird.net/postcon/elements/1.0/Movement">
 <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/>
<rdfs:label xml:lang="en"> Web Resource Movement</rdfs:label>
 <rdfs:comment xml:lang="en">
    An event for the resource within the PostCon system
 </rdfs:comment>
</rdfs:Class>

<rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/bio">
 <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/>
 <rdfs:label xml:lang="en">Resource biography</rdfs:label>
 <rdfs:comment xml:lang="en">
    Biographical information for resource
 </rdfs:comment>
 <rdfs:range rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/>
 <rdfs:domain rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/>
</rdf:Property>

<rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/relevancy">
 <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/>
 <rdfs:label xml:lang="en">Resource Relevancy</rdfs:label>
 <rdfs:comment xml:lang="en">
    Biographical information for resource
 </rdfs:comment>
 <rdfs:range rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/>
 <rdfs:domain rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/>
</rdf:Property>

<rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/presentation">
 <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/>
 <rdfs:label xml:lang="en">Resource Presentation</rdfs:label>
 <rdfs:comment xml:lang="en">
    Information related to relevancy of resource
 </rdfs:comment>
 <rdfs:range rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/>
 <rdfs:domain rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/>
</rdf:Property>

<rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/history">
 <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/>
 <rdfs:label xml:lang="en"> Web Content History</rdfs:label>
 <rdfs:comment xml:lang="en">
    History of movement of content within system
 </rdfs:comment>
 <rdfs:range rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/>
 <rdfs:domain rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/>
</rdf:Property>

<rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/currentStatus">
 <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/>
 <rdfs:label xml:lang="en">Current Status</rdfs:label>
 <rdfs:comment>Current status of document (allowable values of Active and Inactive)</rdfs:
comment>
 <rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/>
 <rdfs:domain rdf:resource="http://postcon/elements/1.0/Relevancy"/>
</rdf:Property>

<rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/reason">
 <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/>
 <rdfs:label xml:lang="en">Reason</rdfs:label>
 <rdfs:comment>Reason</rdfs:comment>
 <rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/>
 <rdfs:domain rdf:resource="http://postcon/elements/1.0/Resource"/>
</rdf:Property>

<rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/movementType">
 <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/>
 <rdfs:label xml:lang="en">Movement Type</rdfs:label>
 <rdfs:comment>Type of Movement (allowable values of Move, Add, Remove)</rdfs:comment>
 <rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/>
 <rdfs:domain rdf:resource="http://postcon/elements/1.0/Movement"/>
</rdf:Property>

<rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/related">
 <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/>
 <rdfs:label xml:lang="en"> Related Resource</rdfs:label>
 <rdfs:comment xml:lang="en">
    Resources within PostCon system related to current resource
 </rdfs:comment>
 <rdfs:range rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/>
</rdf:Property>

<rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/requires">
 <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/>
 <rdfs:label xml:lang="en">Resource Requirement</rdfs:label>
 <rdfs:comment xml:lang="en">
    External resource required by current resource
 </rdfs:comment>
 <rdfs:range rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/>
</rdf:Property>

<rdf:Property rdf:about="http://burningbird.net/postcon/elements/1.0/type">
 <rdfs:isDefinedBy rdf:resource="http://burningbird.net/postcon/elements/1.0/"/>
 <rdfs:label xml:lang="en">Resource Type</rdfs:label>
 <rdfs:comment>Type of Required Resource</rdfs:comment>
 <rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/>
</rdf:Property>

</rdf:RDF>

The schema is in RDF/XML and can be validated. Once validated, it can be embedded within an outer HTML or XHTML document in the location of the schema URI or left as a pure RDF/XML document in same location. The main reason for doing this (it's not required) is to give people the opportunity to review the schema to better understand the vocabulary. In addition, another reason to do this is that some tools, such as BrownSauce (which we'll look at in detail in Chapter 7), use the schema to provide better information about the RDF graph.

6.5.5 Using DC-dot to Generate DC RDF

Much about a document can be deleted directly from the document itself. The format, location, subject, author, and copyright from HTML meta tags and so on can all be derived from scraping the HTML for a particular web resource.

Based on this, an organization going by the abbreviation UKOLN, at the University of Bath in the UK, created the DC-dot generator. This online application will scrape a web resource, pull whatever information it can from it, and then return the result formatted in multiple ways, including RDF, XHTML meta tags, and straight XML.

Access DC-dot at http://www.ukoln.ac.uk/metadata/dcdot/.

I decided to try this with the sample "Tale of Two Monsters" article. In the first page of the application, I entered the URL for the document, and checked both boxes to have the tool attempt to determine publisher and return RDF. The page returned has a first guess at the RDF/XML and provides a form that you can then use to modify the DC elements generated. Figure 6-4 displays the form you can use to modify the results.

Figure 6-4. DC-dot format to modify results
figs/prdf_0604.gif

With some modifications, the DC RDF/XML document generated is shown in Example 6-8.

Example 6-8. DC-dot-generated RDF/XML
<?xml version="1.0"?>
<!DOCTYPE rdf:RDF SYSTEM "http://purl.org/dc/schemas/dcmes-xml-20000714.dtd">

<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:dc="http://purl.org/dc/elements/1.1/">
  <rdf:Description about="http://burningbird.net/articles/monsters3.htm">
    <dc:title>
      Tale of Two Monsters: Architeuthis Dux
    </dc:title>
    <dc:creator>
      Shelley Powers
    </dc:creator>
    <dc:subject>
      Internet; Web; Computers; Software; Technology;
      Meteorology; Geology; Oceanography; Astronomy; Math;
      Science; Physics; P2P
    </dc:subject>
    <dc:description>
      The Giant Squid and its relationship to mythology.
    </dc:description>
    <dc:publisher>
      Burningbird
    </dc:publisher>
    <dc:date>
      2002-01-20
    </dc:date>
    <dc:type>
      Text
    </dc:type>
    <dc:format>
      text/html
    </dc:format>
    <dc:format>
      8287 bytes
    </dc:format>
  </rdf:Description>
</rdf:RDF>

The generated RDF/XML validates with the RDF Validator, except for one element, boldfaced in the example code?the generator uses an unqualified about attribute, which, though allowed for existing vocabularies, is discouraged with new vocabularies and RDF/XML instances. However, this is a quick change to make.

Now that you've had a chance to try out RDF/XML, it's time to try out a few of the many, many tools and utilities and APIs that have been created specifically for processing RDF/XML.