19.2 Data Models for XML Documents

  Previous section   Next section

A small example will illustrate the application of the different approaches. In Listing 19.1, we introduce the Document Type Definition (DTD) (W3C 1998b) for XML documents that contain personnel, a set of professor[s] with their name[s] and their course[s]. The name consists of a firstname and a lastname, and the courses have a title and a description. Both, professor and course, have an attribute?employeeNo and courseNo, respectively.

Listing 19.1 DTD for University Personnel
<!ELEMENT personnel (professor+)>
<!ELEMENT professor (name, course+)>
<!ATTLIST professor employeeNo ID #REQUIRED>
<!ELEMENT name (firstname, lastname)>
<!ELEMENT course (title, description)
<!ATTLIST course courseNo ID #REQUIRED>
<!ENTITY % textdata "(     firstname, lastname, title, description)">
<!ELEMENT % textdata; (#PCDATA)>

The XML document in Listing 19.2 applies this DTD and will be used during this chapter. It contains a professor with employeeNo, firstname, and lastname. The professor gives a course with title and description.

Listing 19.2 Sample XML Document
<?xml version=(1.0(?>
 <professor employeeNo=(0802(>
  <course courseNo=(TR1234(>
    <title>Document Structuring with SGML
    <description>In this course  . . .

The Document Object Model (DOM) of the WWW Consortium (W3C 1998a) is based on an object-oriented viewpoint of documents and their parts. It organizes them hierarchically in a document tree. The elements of a document become the inner nodes of the DOM tree. Attributes, comments, processing instructions, texts, entities, and notations form the leaves of the tree.

19.2.1 The Nontyped DOM Implementation

In a nontyped DOM implementation, one class is defined for every interface of the DOM. Figure 19.1 shows an excerpt of the DOM in the notation of the Unified Modeling Language (UML) (OMG 1999). The class NodeImpl that implements the interface Node contains attributes called nodeName, nodeValue, and nodeType to store the content of a node, and attributes called parentNode and childNodes to implement the tree and to allow navigation from a tree node to its children and from a node to its parent. It also implements the predefined methods, like firstChild, lastChild, nextSibling, and so on. By means of these methods, the tree can be built and traversed. Subclasses like ElementImpl and AttrImpl implement the subinterfaces like Element and Attribute and provide, if necessary, additional attributes and the required method definitions.

Figure 19.1. Excerpt of the DOM


The nontyped implementation of the DOM is document neutral?it does not reflect the structure of the documents. M. Yoshikawa et al. call this the "model-mapping approach" (Yoshikawa et al. 2001). For the interface Element there exists a unique class ElementImpl, even though many different element types can occur in an XML document. They also do not explicitly reproduce the nesting: It cannot be seen from the classes that certain elements are subelements of others.

To summarize, using the DOM the whole XML document is kept in a set of instances belonging to classes that implement the interfaces. The instances and not the classes contain the document-specific information.

19.2.2 The Typed DOM Implementation

As an extension to the nontyped implementation of the DOM, a subclass of the class ElementImpl can now be defined for every element type of an XML document. These classes have relationships to their subelements, attributes, and text nodes represented by compositions. The association childNodes proposed by the DOM is then superfluous.

When applying the typed DOM implementation to our example, the classes Personnel, Professor, Name, Firstname, and Lastname are defined as subclasses to the class ElementImpl, which implements the interface Element. The class EmployeeNo is defined as subclass to the class AttrImpl. Figure 19.2 shows these classes and their attributes and the relationships between them in UML. The class Personnel has a multivalued and ordered composition to the class Professor. This relationship is derived from the definition of the element personnel included in the DTD?a personnel element contains one or several professor elements:

<!ELEMENT personnel (professor+)>
Figure 19.2. A Typed DOM Implementation


Compositions are also created between the classes Professor, Name, and EmployeeNo; between the classes Name, Firstname, and Lastname; as well as between the latter and CDATASectionImpl.

The difference between the two approaches is that in the first approach the document structure is reproduced only in the states of the instances. By contrast, in the second approach, the structure of the document is shown in the composition hierarchy of the classes. M. Yoshikawa et al. call this the "structure-mapping approach" (Yoshikawa et al. 2001). In the nontyped implementation of the DOM, the subelement name of the element professor is one node among several child nodes of the professor node. In the typed implementation, it is by contrast an instance of a class Name that is referenced from an instance of the class Professor.

Assigning the nodes a type forces the application that is processing the XML document to recognize the document structure. When running through the DOM tree, the application must follow the typed composition paths. A Name instance can only be reached from a Professor instance via a composition of the type Name. This is a very strong restriction since the application that processes the document tree has to know the names and types of the composition?that is, the subelements. If it uses the interface Node instead of the typed classes, which is allowed because of the subtyping, it actually deals with the nontyped implementation of the DOM.


Part IV: Applications of XML