Recently there is huge interest in XML. XML has been recognized as an adequate format for data exchange for Internet applications, because XML is portable across computers and applications and independent of existing databases and programming languages. First-class support for XML is a requirement for modern information platforms for the Web. Since XML documents must be held persistently, the relevance of XML has passed over to the area of databases that have to care for the adequate storage and handling of documents.
Much research and development effort has been focused recently on so-called "native" XML databases like Tamino (Schöning and Wäsch 2000), which promise optimized storage and retrieval of XML documents. Although native XML databases and query technology are important capabilities, a common opinion is that they will be successful in the mainstream market only if they are efficiently and effectively combined with SQL and relational database technology in an overall system architecture.
Indeed, relational database technology is well established and widespread, supporting efficient data management for even huge databases and providing scalability even for thousands of parallel users. Relational database systems offer powerful SQL query mechanisms, conforming to standards and providing effective query optimization. Taking up emerging requirements, vendors of relational database systems are extending their products to support storage and retrieval of XML data. In this way, they benefit from important existing features such as concurrency control, security, backup/recovery, and query optimization. Handling documents is done either by storing native XML in one relational column, or by supporting the definition of mappings between a relational schema to an XML schema, or a DTD. The mapping can be used to automatically compose/decompose XML during retrieval/insertion (Cheng and Xu 2000a).
We focus in this chapter on Oracle as a representative for relational database technology and discuss how to use Oracle9i as an XML database?that is, how to store and retrieve XML documents in an Oracle database.
To achieve full integration of XML and relational data, two aspects of functionality need to be addressed. From a data storage perspective, both XML data and relational data should be supported in the same database. From a query language point of view, both SQL-based and XML-based query features should be available to access data, regardless of how the data are stored. This implies that SQL should also be supported over a relational view of XML data, and XML query capabilities should be available over native XML data as well as an XML view of relational data.
Oracle attempts to satisfy these needs. Oracle's solution makes sure that XML queries can be evaluated efficiently on "real" XML documents and on XML "views" of relational data. This goal was first postulated by the Xperanto project (Carey et al. 2000; Shanmugasundaram et al. 2001).
There are two principal approaches to relational databases so far:
Document-centric documents are better stored in a coarse-grained manner in unstructured LOBs (Large Objects).
A fine-grained storage in attributes of one or more tables is suited for data-centric documents.
Approach 1 provides a simple solution for a relational DBMS, as no new features are required, at least at first glance. Moreover, the first approach places no restrictions on the kinds of document since documents are just taken as they are. Unfortunately, there is currently no SQL for querying, but this may not be a drawback in view of XQuery for searching documents by contents. At least we can state that there is no query performance comparable to SQL. XML-specific query features are then demanded for LOB columns. Another disadvantage to the first approach is that XML, more precisely the underlying LOB, is the unit of work: No modification of document parts is possible?only a complete replacement of the document.
In the diverging principle of approach 2, the powerful query mechanisms of SQL can still be used (even if SQL violates the spirit of XML), thereby offering outstanding query performance. Furthermore, document parts can be easily modified as their values are stored in table columns. This approach also makes it possible to "reuse" relational data: We get an opportunity to compose XML documents of table data. These capabilities for viewing XML data as relational data are useful to maximize the applicability of relational technology and tools for data mining, analysis, report generation, and so on. As a general hindrance, XML documents must possess a certain structure, at least given by a DTD. Thus, this approach is less flexible concerning the kinds documents to handle.
Oracle supports both approaches and a combination of them as well. Oracle's CLOB (Character Large Object) functionality can be used to handle XML documents in the sense of approach 1. When storing XML documents as a whole, the retrieval by means of associative search criteria is an essential problem. Much attention has been recently focused on query languages for XML (Robie et al. 1998), and the need for standardization of XML queries has been recognized (Chamberlin et al. 2001). Despite achieving progress, an integration of full-text search in XML queries has not yet been addressed. Some proposals are discussed in Shinagawa et al. 2000; Fiebig and Moerkotte 2000; Shanmugasundaram et al. 1999; McHugh and Widom 1999; Cheng and Xu 2000b; Florescu and Kossmann 1999a; Deutsch et al. 1999. Oracle solves the problem with the OracleText cartridge, which allows for text retrieval including adequate XML support. Furthermore, Oracle9i introduces a new object type XMLType to encapsulate XML documents in a coarse-grained manner. The methods of XMLType support easy handling of documents. Later, we will discuss how to store XML documents simply as a CLOB, and how the new Oracle9i object type XMLType eases the handling in addition. The power of OracleText is examined from the perspective of associative search capabilities.
The general principles of storing XML documents in relational database columns are discussed in Shanmugasundaram et al. 1999, Kanne and Moerkotte 2000, and Florescu and Kossmann 1999a. Oracle's XML SQL Utility (XSU) pursues a model-based approach. A schematic mapping defines how to map tables and views, including object-relational features, to XML documents. XSU also helps to store documents in a fine-grained manner, mapping XML structures into tables. This is done by automatically extracting relational information from stored XML using an inverse XML-to-relational mapping information. Flexibility is obtained by means of user-defined object views, the structure of which directly determines the nesting of the resulting XML. XSU is available for Java and the proprietary PL/SQL database programming language of Oracle. We will discuss the XML SQL utility later in this chapter.
We will also discuss how to build XML documents?for example, for placing relational contents on the Web. We focus on SQL enhancements to work with XML. Oracle extends classical SQL with new SQL functions for publishing relational data as XML (Shanmugasundaram et al. 2000). A JSP-like complementary approach helps to produce XML documents via Web access.
Also, we briefly tackle further concepts, such as uniform resource identifiers (URIs) that point into the database referring to tables, tuples, and rows. An XML parser, according to SAX and DOM interfaces, includes a validation with respect to a DTD or XML Schema. The parsers are particularly executable and callable in the database server. The parser as well as the other tools support XSLT for post-processing documents, especially query results. A class generator takes DTDs and XML Schemas as input and produces Java or C++ classes that allow the construction of valid XML documents. Oracle also offers Java Beans for viewing and interactively designing documents.