10.10 Other Options

As XML has spread, more and more people have had creative (and often useful) ideas about how to process it.

10.10.1 XPath as API

The XPath language provides a convenient method to specify which nodes to return in a tree context. A parser written as a hybrid will only need to return a list of nodes that match an XPath expression. A stream parser efficiently searches through the document to find the nodes, then passes the locations to a tree builder that assembles them into object trees. XPath's advantage is that it is has a very rich language for specifying nodes, giving the developer a lot of control and flexibility. The parsers libxml2 and MSXML are two that come with XPath interfaces.

10.10.2 JDOM

Despite the name, JDOM is not merely a Java implementation of DOM. Rather, it is an alternative to SAX and DOM that is described by its developers as "lightweight and fast . . . optimized for the Java programmer." It doesn't actually replace other parsers, but uses them to build object representations of documents with an interface that is easy to manipulate. It is designed to integrate with SAX and DOM, supplying a simple and useful interface layer on top.

The proponents of JDOM say it is needed to reduce the complexity of the factory-based specifications for SAX and DOM. For that reason, the JDOM specification itself is defined with classes and not interfaces. In addition to substituting its own new API, JDOM includes the fabulous XPath API.

10.10.3 Hybrids

If streams and trees are the two extremes on a spectrum of XML processing techniques, then the middle ground is home to solutions we might call hybrids. They combine the best of both worlds, low resource overhead of streams with the convenience of a tree structure, by switching between the two modes as necessary. The idea is, if you are only interested in working with a small slice of a document and can safely ignore the rest, then you only need to work with a subtree. The parser scans through the stream until it sees the part that you want, then switches to tree building mode.

One example is the Perl module XML::Twig by Michel Rodriguez. Before parsing, you tell the parser which twigs you want it to find, for example, every section element in a DocBook book. It will return a tree one at a time for processing, nimbly side-stepping the problem of storing the whole book in memory at the same time.

10.10.4 Data Binding

Some developers don't need direct access to XML document structuresthey just want to work with objects or other data structures. Data binding approaches minimize the amount of interaction between the developer and the XML itself. Instead of creating XML directly, an API takes an object and serializes it. Instead of reading an XML document and interpreting its parts, an API takes an XML document and presents it as an object.

Data binding processing tends to focus on schemas, which are used as the foundation for describing the XML representing a particular object. The type and structure information used in the schema provides the data binding processor with information about both the XML documents and the objects, and a simple mapping between them suffices for a large number of cases. Data binding is also at the heart of web services, a set of technologies for using XML to send information over a network between programs.

There are a variety of data binding implementations available, largely for the Java and .NET platforms.