7.7 Providing XML Views over Relational Data

The previous sections presented an SQL-centric approach to generating and consuming XML. This section introduces an XML-centric mechanism that allows the definition of virtual XML views over the relational database that can then can be queried and updated with XML-based tools.

7.7.1 Annotated Schemata

The core mechanism of providing XML views over the relational data is the concept of an annotated schema. Annotated schemata consist of a schema description of the exposed XML view and annotations that describe the mapping of the XML schema constructs onto the relational schema constructs. SQL Server 2000 supports both the older XML Data Reduced (XDR) schema language as well as the W3C XML Schema format. Since the schema documents are XML documents and their content model is open, the annotations can be placed inline.

In order to simplify the definition of the annotations, each schema provides a default mapping if no annotation is present. The default mapping maps an attribute or a noncomplex subelement (i.e., content type is text only) to a relational column with the same name. All other elements map into rows of a table or view with the same name. Since SQL Server 2000 does not support nested relations, hierarchy is not mappable without an annotation.

Listing 7.10 shows a simple XML view that defines a Customer-Order hierarchy over the Customers and Orders table of the relational database. Everything that is in the default namespace belongs to the XDR schema definition; the annotations are associated with the familiar namespace urn:schemas-microsoft-com:xml-sql.

Listing 7.10 Simple XML View

<Schema xmlns="urn:schemas-microsoft-com:xml-data"
        xmlns:sql="urn:schemas-microsoft-com:xml-sql">
<ElementType name="Customer" sql:relation="Customers">
    <AttributeType name="ID" />
    <attribute type="ID" sql:field="CustomerID" />
    <element type="Order">
      <sql:relationship key-relation="Customers"
                        key="CustomerID"
                        foreign-relation="Orders"
                        foreign-key="CustomerID"/>
    </element>
</ElementType>
<ElementType name="Order" sql:relation="Orders">
    <AttributeType name="OrderID" />
    <attribute type="OrderID" />
   </ElementType>
</Schema>

The Customer element is mapped to the Customers table using the sql:relation annotation; its attribute ID is mapped to the table's CustomerID column with the sql:field annotation. Finally, the similarly mapped Order element is parented under the Customer element. The sql:relationship annotation provides the hierarchy information as a conceptual left-outer join that describes how the child data relates to the parent data. Additional annotations exist to define XML-specific information such as defining ID-prefixes and CDATA sections, and annotations to define limit values to deal with value-driven horizontal partitions.

In order to define the annotations in a graphical way, a utility called the SQL XML View Mapper is available for download from the Microsoft Web site.

The annotated schema does not retrieve any data per se but only defines a virtual view by projecting an XML view on the relational tables. It actually defines two potential views, customers containing orders and just orders, since neither XML Schema language defines an explicit root node. Thus, we need additional information to actually determine which of the two views will be used. This information will be provided implicitly by the query, the Updategram, and Bulkload as explained in the following sections.

7.7.2 Querying Using XPath

XPath is a tree navigation language and is defined in a W3C recommendation. XPath is not a full-fledged query language (it does not provide constructive elements such as projection or subtree pruning) but serves as a basis for navigating the XML tree structure. Each XPath basically consists of a sequence of location steps that navigate the tree with optional predicates to constrain the navigation paths.

SQL Server 2000 uses a subset of XPath 1.0 to select data from the virtual XML views provided by annotated schemata. The first location step of the XPath determines the exact view used of the many potential views defined by the annotated schema by determining the first hierarchy. Instead of returning only a collection of nodes, the selected nodes and their complete subtree is serialized in the resulting XML fragment.

In principle, XPath constructs that are easily mapped to the relational constructs of SQL Server are supported. The currently supported constructs include the non-order, non-recursive navigation axes, all data types, all relational and Boolean operators, all but one arithmetic operation, and variables. Notably not supported in the current release are the id() function and the order and recursive axes such as the descendent axis.

Since the views are virtual views, the XPath query together with the annotated schema is translated into a FOR XML explicit query that returns only the XML data that is required by the query. The implementation goes to great length to provide the expected XPath semantics such as preserving the node list order imposed by the parent orders when navigating down the tree and the XPath data type coercion rules. The only two places where the implementation differs from the W3C XPath semantics is with respect to the coercion rules of strings with the < and > comparison operations and with respect to node-to-string conversions in predicates. In the first case, the implementation does not try to coerce to a number but does a string-based comparison on the default collation, which provides support for date-time comparisons. In the second case, XPath mandates a "first-match" evaluation semantics that cannot be mapped to the relational system. Instead, the implementation performs the more intuitive "any-match" evaluation.

For example, the XPath:

/Customer[ @ID='ALFKI']

against the preceding annotated schema may result in the XML in Listing 7.11.

Listing 7.11 XPath Query Result

<Customer ID="ALFKI">
  <Order OrderID="10643" />
  <Order OrderID="10692" />
</Customer>

The XPath query can be passed via a URL or a template, or via the XPath dialect of the client provider. The following shows each access method for the query example (/Customer[ @ID='ALFKI']), assuming that the schema file is called CustOrd.xdr. First the URL:

http://domainserver/dbvroot/schema/CustOrd.xdr/Customer[ @ID= 'ALFKI']

The template wraps the result with a root element and parameterizes the ID. The mapping-schema attribute on the sql:xpath-query element indicates the location of the annotated schema relative to the template file.

Listing 7.12 XPath Query Template

<root>
  <sql:header xmlns:sql="urn:schemas-microsoft-com:xml-sql" >
    <sql:param name="cid">ALFKI</sql:param>
  </sql:header>

  <sql:xpath-query mapping-schema="CustOrd.xdr"
                   xmlns:sql="urn:schemas-microsoft-com:xml-sql" >
    /Customer[ @ID=$cid]
  </sql:xpath-query>
</root>

Finally, the Visual Basic fragment in Listing 7.13 shows how to use ADO to post an XPath query.

Listing 7.13 XPath Query Using ADO

conn.Open strConn
Set cmd.ActiveConnection = conn
cmd.Dialect = "{ec2a4293-e898-11d2-b1b7-00c04f680c56}"
cmd.CommandText = "/Customer[ @ID='ALFKI']"
cmd.Properties("Output Stream").Value = Response
cmd.Properties("Base Path") = "c:\schemas"
cmd.Properties("Mapping schema") = "CustOrd.xdr"

cmd.Execute , , adExecuteStream

7.7.3 Updating Using Updategrams

Updategrams provide an intuitive way to perform an instance-based transformation from a before state to an after state. Updategrams operate over either a default XML view implied by its instance data (if no annotated schema is referenced) or over the view defined by the annotated schema and the top-level element of the Updategram. Listing 7.14 is a simple example.

Listing 7.14 Updategram Example

<root xmlns:updg="urn:schemas-microsoft-com:xml-updategram">
  <updg:sync mapping-schema="nwind.xml" nullvalue="ISNULL">
    <updg:before>
      <Customer CustomerID="LAZYK" CompanyName="ISNULL"
                Address="12 Orchestra Terrace">
        <Order oid="10482"/>
      </Customer>
    </updg:before>
    <updg:after>
      <Customer CustomerID="LAZYK"
                CompanyName="Lazy K Country Store"
                Address="12 Opera Court">
        <Order oid="10354"/>
      </Customer>
    </updg:after>
  </updg:sync>
</root>

Updategrams use their own namespace urn:schemas-microsoft-com:xml-updategram. Each updg:sync block defines the boundaries of an update batch that uses optimistic concurrency control to perform the updates transactionally. The before image in updg:before is used both for determining the data to be updated as well as to perform the conflict test. The after image in updg:after gives what has to be changed. If the before state is empty or missing, the after state defines an insert. If the after state is empty or missing, the before state defines what has to be deleted. Otherwise the necessary insertions, updates, and deletions are inferred from the difference between the before and after image. Several optional features allow the user to deal with identity and aligning elements between the before and after state. The nullvalue attribute indicates that the fields with the specified value need to be compared or set to NULL, respectively.

In Listing 7.14, the customer with the given data (including a company name set to NULL) gets a new company name and address. In addition, the relation to the order 10482 is removed and replaced by a new relation to order 10354.

7.7.4 Bulkloading

Neither Updategrams nor the OpenXML mechanism previously described is well suited to load large amounts of XML data into the database since they both require the whole XML document to be loaded into memory before they perform the insertion of the data. Bulkload is provided as a COM object that allows loading large amounts of XML data via an annotated schema into either an existing database or after creating the relational schema implied by the annotated schema. Transacted and nontransacted load mechanisms are available, and Bulkload can be integrated into a data transformation system (DTS) workflow.

For efficiency, the XML data are streamed in only once via the SAX parser to perform the load. Therefore, the loaded XML data needs to satisfy certain conditions in order to provide the correct loading of the data. For example, all the data that generates a new row needs to appear in the same element context. In order to avoid buffering a potentially large amount of XML data, the information that contains the key of the parent has to appear before any of its children.

Listing 7.15 is a sample Visual Basic script that uses the Bulkload object to load a file into the database.

Listing 7.15 Visual Basic Script for Bulkload

[View full width]

set objBL = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad")

'open SQL Server connection
objBL.ConnectionString = "provider=SQLOLEDB.1;data source=mydatabase;database=Northwind;
uid=user;pwd=password"

'set Bulkload properties
objBL.ErrorLogFile = "c:\blklderror.xml"
objBL.CheckConstraints = True
objBL.Transaction = True
objBL.KeepIdentity = True
objBL.SchemaGen = False

objBL.Execute "c:\annotatedschema.xdr", "c:\data.xml"

Set objBL = Nothing