15.5 RDF and Adobe: XMP

Rather than integrate RDF into the architecture of a tool from the ground up, as occurred with the previous applications discussed in this chapter, other companies are incorporating RDF and RDF/XML into their existing applications. Adobe, a major player in the publications and graphics business, is one such company. Its RDF/XML strategy is known as XMP?eXtensible Metadata Platform. According to the Adobe XMP web site, other major players have agreed to support the XMP framework, including companies such as Microsoft.

XMP focuses on providing a metadata label that can be embedded directly into applications, files, and databases, including binary data, using what Adobe calls XMP packets?XML fragments that can be embedded regardless of recipient format. Regardless of where the material is moved or located, the data contained in the embedded material moves with it and can be accessed by external tools using the XMP Toolkit. Adobe has added support for XMP to Photoshop 7.0, Acrobat 5.0, FrameMaker 7.0, GoLive 6.0, InCopy 2.0, InDesign 2.0, Illustrator 10, and LiveMotion 2.0.

The information included within the embedded labels can be from any schema as long as it's recorded in valid RDF/XML. The XMP source code is freely available for download, use, and modification under an open source license.

Read more about Adobe XMP at http://www.adobe.com/products/xmp/. Download the SDK at http://partners.adobe.com/asn/developer/xmp/main.html.

Unlike so much of the RDF/XML technology, which emphasizes Java or Python, the XMP Toolkit provides only support for C++. Specifically, the toolkit works with Microsoft's Visual C++ in Windows (or compatible compiler) and Metrowerks CodeWarrior C++ for the Mac.

Within the SDK is a subdirectory of C++ code that allows a person to read and write XMP metadata. Included in the SDK is a good set of documentation that provides samples and instructions on embedding XMP metadata into TIFF, HTML, JPEG, PNG, PDF, SVG/XML, Illustrator (.ai), Photoshop (.psd), and Postscript and EPS formats.

The SDK is a bit out of date in regard to recent activities with RDF and RDF/XML. For instance, when discussing embedded RDF/XML into HTML documents, it references a W3C note that was favorable to the idea of embedding of RDF/XML into HTML. However, as you read in Chapter 3, recent decisions discourage the embedding of metadata into (X)HTML documents, though it isn't expressly forbidden.

The SDK contains some documentation, but be forewarned, it assumes significant experience with the different data types, as well as experience working with C++. The document of most interest is the Metadata Framework PDF file, specifically the section discussing how XMP works with RDF, as well as the section on extending XMP with external RDF/XML Schemas. This involves nothing more than defining data in valid RDF and using a namespace for data not from the core schemas used by XMP. The section titled "XMP Schemas" lists all elements of XMP's built-in schemas.

The SDK also includes C++ and the necessary support files for the Metadata Library, as well as some other utilities and samples. I dusted off my rarely used Visual C++ 6.0 to access the project for the Metadata Toolkit, Windows, and was able to build the library without any problems just by accessing the project file, XAPToolkit.dsw. The other C++ applications also compiled cleanly as long as I remembered to add the paths for the included header files and libraries.

One of the samples included with the SDK was XAPDumper, an application that scans for embedded RDF/XML within an application or file and then prints it out. I compiled it and ran it against the SDKOverview.pdf document. An excerpt of the embedded data found in this file is:

<rdf:Description rdf:about=''
 xmlns:pdf='http://ns.adobe.com/pdf/1.3/'>
 <pdf:Producer>Acrobat Distiller 5.0.5 for Macintosh</pdf:Producer>
 <!--pdf:CreationDate is aliased-->
 <!--pdf:ModDate is aliased-->
 <!--pdf:Creator is aliased-->
 <!--pdf:Author is aliased-->
 <!--pdf:Title is aliased-->
</rdf:Description>

Embedding RDF/XML isn't much different than attaching a bar code to physical objects. Both RDF and bar codes uniquely identify important information about the object in case it becomes separated from an initial package. In addition, within a publications environment, if all of the files are marked with this RDF/XML-embedded information, automated processes could access this information and use it to determine how to connect the different files together, such as embedding a JPEG file into an HTML page and so on.

I can see the advantage of embedded RDF/XML for any source that's loaded to the Web. Eventually, web bots could access and use this information to provide more intelligent information about the resources that they touch. Instead of a few keywords and a title as well as document type, these bots could provide an entire history of a document or picture, as well as every particular about it.

Other applications can also build in support for working with XMP. For instance, RDF Gateway, mentioned earlier, has the capability of reading in Adobe XMP. An example of how this application would access data from an Adobe PDF would be:

var monsters = new
DataSource("inet?url=http://burningbird.net/articles/monsters3.pdf&parse
type=xmp");

An important consideration with these embedded techniques is that there is no adverse impact on the file, nothing that impacts on the visibility of a JPEG or a PNG graphic or prevents an HTML file from loading into a browser. In fact, if you've read any PDF files from Adobe and other sites that use the newer Adobe products, you've probably been working with XMP documents containing embedded RDF/XML and didn't even know it.