1.3 When to Use and Not Use RDF

RDF is a wonderful technology, and I'll be at the front in its parade of fans. However, I don't consider it a replacement for other technologies, and I don't consider its use appropriate in all circumstances. Just because data is on the Web, or accessed via the Web, doesn't mean it has to be organized with RDF. Forcing RDF into uses that don't realize its potential will only result in a general push back against RDF in its entirety?including push back in uses in which RDF positively shines.

This, then, begs the question: when should we, and when should we not, use RDF? More specifically, since much of RDF focuses on its serialization to RDF/XML, when should we use RDF/XML and when should we use non-RDF XML?

As the final edits for this book were in progress, a company called Semaview published a graphic depicting the differences between XML and RDF/XML (found at http://www.semaview.com/c/RDFvsXML.html). Among those listed was one about the tree-structured nature of XML, as compared to RDF's much flatter triple-based pattern. XML is hierarchical, which means that all related elements must be nested within the elements they're related to. RDF does not require this nested structure.

To demonstrate this difference, consider a web resource, which has a history of movement on the Web. Each element in that history has an associated URL, representing the location of the web resource after the movement has occurred. In addition, there's an associated reason why the resource was moved, resulting in this particular event. Recording these relationships in non-RDF XML results in an XML hierarchy four layers deep:

<?xml version="1.0"?>
<resource>
  <uri>http://burningbird.net/articles/monsters3.htm</uri>
  <history>
    <movement>
       <link>http://www.yasd.com/dynaearth/monsters3.htm</link>
       <reason>New Article</reason>
    </movement>
  </history>
</resource>

In RDF/XML, you can associate two separate XML structures with each other through a Uniform Resource Identifier (URI, discussed in Chapter 2). With the URI, you can link one XML structure to another without having to embed the second structure directly within the first:

<?xml version="1.0"?>
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:pstcn="http://burningbird.net/postcon/elements/1.0/"
  xml:base="http://burningbird.net/articles/">

  <pstcn:Resource rdf:about="monsters3.htm">

<!--resource movements-->
     <pstcn:history>
       <rdf:Seq>
        <rdf:_3 rdf:resource="http://www.yasd.com/dynaearth/monsters3.htm" />
      </rdf:Seq>    
     </pstcn:history>

  </pstcn:Resource>

  <pstcn:Movement rdf:about="http://www.yasd.com/dynaearth/monsters3.htm">
     <pstcn:movementType>Add</pstcn:movementType>
     <pstcn:reason>New Article</pstcn:reason>
  </pstcn:Movement>

</rdf:RDF>

Ignore for the moment some of the other characteristics of RDF/XML, such as the use of namespaces, which we'll get into later in the book, and focus instead on the structure. The RDF/XML is still well-formed XML?a requirement of RDF/XML?but the use of the URI (in this case, the URL "http://www.yasd.com/dynaearth/monsters3.htm") breaks us out of the forced hierarchy of standard XML, but still allows us to record the relationship between the resource's history and the particular movement.

However, this difference in structure can make it more difficult for people to read the RDF/XML document and actually see the relationships between the data, one of the more common complaints about RDF/XML. With non-RDF XML, you can, at a glance, see that the history element is directly related to this specific resource element and so on. In addition, even this small example demonstrates that RDF adds a layer of complexity on the XML that can be off-putting when working with it manually. Within an automated process, though, the RDF/XML structure is actually an advantage.

When processing XML, an element isn't actually complete until you reach its end tag. If an application is parsing an XML document into elements in memory before transferring them into another persisted form of data, this means that the elements that contain other elements must be retained in memory until their internal data members are processed. This can result in some fairly significant strain on memory use, particularly with larger XML documents.

RDF/XML, on the other hand, would allow you to process the first element quickly because its "contained" data is actually stored in another element somewhere else in the document. As long as the relationship between the two elements can be established through the URI, we'll always be able to reconstruct the original data regardless of how it's been transformed.

Another advantage to the RDF/XML approach is when querying the data. Again, in XML, if you're looking for a specific piece of data, you basically have to provide the entire structure of all the elements preceding the piece of data in order to ensure you have the proper value. As you'll see in RDF/XML, all you have to do is remember the triple nature of the specification, and look for a triple with a pattern matching a specific resource URI, such as a property URI, and you'll find the specific value. Returning to the RDF/XML shown earlier, you can find the reason for the specific movement just by looking for the following pattern:

<http://www.yasd.com/dynaearth/monsters3.htm> pstcn:reason ?

The entire document does not have to be traversed to answer this query, nor do you have to specify the entire element path to find the value.

If you've worked with database systems before, you'll recognize that many of the differences between RDF/XML and XML are similar to the differences between relational and hierarchical databases. Hierarchical databases also have a physical location dependency that requires related data to be bilocated, while relational databases depend on the use of identifiers to relate data.

Another reason you would use RDF/XML over non-RDF XML is the ability to join data from two disparate vocabularies easily, without having to negotiate structural differences between the two. Since the XML from both data sets is based on the same model (RDF) and since both make use of namespaces (which prevent element name collision?the same element name appearing in both vocabularies), combining data from both vocabularies can occur immediately, and with no preliminary work. This is essential for the Semantic Web, the basis for the work on RDF and RDF/XML. However, this is also essential in any business that may need to combine data from two different companies, such as a supplier of raw goods and a manufacturer that uses these raw goods. (Read more on this in the sidebar Data Handshaking Through the Ages).

As excellent as these two reasons (less strain on memory and joining vocabularies) are for utilizing RDF as a model for data and RDF/XML as a format, for certain instances of data stored on the Web, RDF is clearly not a replacement. As an example, RDF is not a replacement for XHTML for defining web pages that are displayed in a browser. RDF is also not a replacement for CSS, which is used to control how that data is displayed. Both CSS and XHTML are optimized for their particular uses, organizing and displaying data in a web browser. RDF's purpose differs?it's used to capture specific statements about a resource, statements that help form a more complete picture of the resource. RDF isn't concerned about either page organization or display.

Now, there might be pieces of information in the XHTML and the CSS that could be reconstructed into statements about a resource, but there's nothing in either technology that specifically says "this is a statement, an assertion if you will, about this resource" in such a way that a machine can easily pick this information out. That's where RDF enters the picture. It lays all assertions out?bang, bang, bang?so that even the most amoeba-like RDF parser can find each individual statement without having to pick around among the presentational and organizational constructs of specifications such as XHTML and CSS.

Additionally, RDF/XML isn't necessarily well suited as a replacement for other uses of XML, such as within SOAP or XML-RPC. The main reason is, again, the level of complexity that RDF/XML adds to the process. A SOAP processor is basically sending a request for a service across the Internet and then processing the results of that request when it's answered. There's a mechanism that supports this process, but the basic structure of SOAP is request service, get answer, process answer. In the case of SOAP, the request and the answer are formatted in XML.

Though a SOAP service call and results are typically formatted in XML, there really isn't the need to persist these outside of this particular invocation, so there really is little drive to format the XML in such a way that it can be combined with other vocabularies at a later time, something that RDF/XML facilitates. Additionally, one hopes that we keep the SOAP request and return as small, lightweight, and uncomplicated answers as possible, and RDF/XML does add to the overhead of the XML. Though bandwidth is not the issue it used to be years ago, it is still enough of an issue to not waste it unnecessarily.

Ultimately, the decision about using RDF/XML in place of XML is based on whether there's a good reason to do so?a business rather than a technical need to use the model and related XML structure. If the data isn't processed automatically, if it isn't persisted and combined with data from other vocabularies, and if you don't need RDF's optimized querying capability, then you should use non-RDF XML. However, if you do need these things, consider the use of RDF/XML.

Data Handshaking Through the Ages

I started working with data and data interchange at Boeing in the late 1980s. At that time, there was a data definition effort named Product Data Exchange Specification (PDES) underway between several manufacturing companies to define one consistent data model that could be used by all of them. With this model, the companies hoped to establish the ability to interchange data among themselves without having to renegotiate data structures every time a new connection was made between the companies, such as adding a new supplier or customer. (This effort is still underway and you can read more about it at http://pdesinc.com.)

PDES was just one effort on the part of specific industries to define common business models that would allow them to interoperate. From Boeing, I went to Sierra Geophysics, a company in Seattle that created software for the oil industry. Sierra Geophysics and its parent company, Halliburton, Inc., were hard at work on POSC, an effort similar to PDES but geared to the oil and gas industries. (You can read more about POSC at http://posc.org; be sure to check out POSC's use of XML, specifically, at http://posc.org/ebiz/xmlLive.shtml.)

One would think this wouldn't be that complex, but it is almost virtually impossible to get two companies to agree on what "data" means. Because of this difficulty, to this day, there's never been complete agreement as to data interchange formats, though with the advent of XML, there was hope that this specification would provide a syntax that most of the companies could agree to use. One reason XML was hailed as a potential savior is that it represented a neutral element in the discussions?no one could claim either the syntax or the syntactic rules.

Would something like RDF/XML work for both of these organizations and their efforts? Yes and no. If the interest in XML is primarily for network protocol uses, I wouldn't necessarily recommend the use of RDF/XML for the same reasons I wouldn't recommend its use with SOAP and XML/RPC?RDF/XML adds a layer of complexity and overhead that can be counterproductive when you're primarily doing nothing more than just sending messages to and from services. However, RDF/XML would fit the needs of POSC and PDES if the interest were on merging data between organizations for more effective supply chain management?in effect, establishing a closer relationship between the supplier of raw goods on one hand and a manufacturer of finished goods on the other. In particular, with an established ontology built on RDF/XML (ontologies are discussed in Chapter 12) defining the business data, it should be a simple matter to add new companies into an existing supply chain.

When one considers that much of the cost of a manufactured item resides in the management of the supply chain and within the manufacturing process, not in the raw material used to manufacture the item, I would expect to see considerable progress from industry efforts such as POSC and PDES in RDF/XML.