9.5 Conclusion

In this chapter we have investigated a set of techniques for building an efficient XML repository using an extensible ORDBMS. The justifications for taking this approach?as opposed to building an XML-specific storage manager from scratch?were

Many applications using XML will be built around preexisting enterprise information systems that already use an (object-)relational DBMS.
We wanted to avoid having to build an (expensive) transactional, reliable, and scalable storage manager.

The key challenges to taking such an approach involve mapping the hierarchical XML data model into the SQL data model without losing any of the performance or flexibility of XML. The application focus of our prototype was XML in support of e-business: a scenario that envisions a multitude of smaller and fairly rigorously defined XML document messages.

Solving these two challenges involved developing some mechanism for representing hierarchies in a database schema, using this mechanism to devise a schema and storage model, and designing algorithms for rewriting XPath expressions into efficient SQL-3. In this chapter we have seen how to achieve all three.

The key functionality required of the DBMS core is an extensible indexing system: one where the comparison operator used for the built-in data types (such as SQL INTEGER or VARCHAR) can be overloaded. At least three systems?IBM's Informix IDS 9.x and Cloudscape DBMS, along with the Open Source PostgreSQL DBMS, provide the desired functionality. In the prototype, we demonstrated how a new user-defined type (which we called the SQL Node type) can supply the necessary functionality for reasoning about node locations in a hierarchy. All of the XPath and XQuery "axes" operations for navigation in an XML document can be represented using operations over SQL Node type values.

The second contribution that the prototype makes is in the area of ORDBMS schema design for an XML repository. While other prototypes have adopted similar "shred and store" approaches, the range of XML types they support is fairly limited. By exercising the ORDBMS's extensible type system, it is feasible to introduce into the ORDBMS core equivalent types and operators to the ones standardized in XML. We also showed how a small modification to the schema and XPath query processor can support XML types in the repository.

Third, the prototype eschewed the traditional XML-as-document model in favor of an XML-as-data model approach. We showed how this provides a ready mechanism for implementing UPDATE operations in our prototype and argued that it presents a superior approach to XML-as-document in terms of computational efficiency and application flexibility.

Fourth, we showed how to map XPath expressions into SQL-3 queries over the schema. The algorithm and resulting SQL-3 queries depend upon the existence of the other parts of the prototype: the Node UDT and its operators, SQL-3 types corresponding to the XML data model's types, and a repository schema that tightly integrates XML Schema and XML Document information.

And finally, we explained how the design of this prototype allows for relatively low-cost inserts compared with other shredding approaches.

All of the techniques described in this chapter are suitable for use with a number of languages (e.g., C, Java) and a number of ORDBMS platforms (IBM's Cloudscape and Informix IDS, and the PostgreSQL Open Source ORDBMS). Overall, this prototype's most useful contribution is that it describes a way for developers to take advantage of XML without either abandoning their preexisting information management systems or waiting for vendors to catch up to the market.

Top