XML

XML

One can hardly open a trade magazine without seeing yet another article describing how eXtensible Markup Language (XML) will make our life easier, and touting it as the solution for all our problems. While the jury is still out, there is no question that XML has made its way into virtually every area of data exchange. Every major RDBMS vendor has made a pledge to XML, and some have already incorporated it into their products.

XML was initially designed as a more rigorous approach to presenting information on the Web, as inadequacies of traditional HyperText Markup Language (HTML) became apparent. Essentially, XML is a way to describe data — in addition to data itself. Unlike HTML, it does not say how the data is to be presented (i.e., color, font, font size etc). To be viewed in, say, a Web browser, the XML document must be first translated into some other form (e.g., HTML), using style sheets to specify how the document should be rendered with the help of Extensible Style Language (XSL). Because of the separation between the data and the presentation level, it is possible to transform the same set of data into different representations.

The document type definitions (DTD) define the XML elements that make up the document; because of the DTD — to which an XML document contains a link — the XML documents are considered self-describing.

Note 

Well-formed XML refers to an XML document that strictly adheres to the standards; being well formed is crucial for seamless integration between applications; there are numerous programs available on the Internet to check whether an XML document is well-formed.

The core specification of XML has achieved universal acceptance, and is ubiquitous over the Internet and is used in many custom-designed applications that previously were the domain of the Electronic Data Interchange (EDI) standard. The term itself sometimes refer to related specifications of XSL, XML Linking Language (XLink), Simple API for XML (SAX), XML Pointer Language (XPointer), and document object model (DOM).

The document object model (DOM) is an application programming interface (API) representing an XML document in a hierarchical treelike structure. Once parsed into the DOM, an XML document is referred to as infoset. An infoset can is manipulated through the DOM objects, and using XPath API — a set of functions for querying/manipulating XML document's values — one could extract and/or modify information contained in the document, which could be later written into a file, rendered into HTML for posting on the Web, and so on.

Extensible Style Language for Transformation (XSLT, XSL) steps in to compensate for XML's inability to communicate just how the contained information is to be presented. It defines a process of transforming one XML document into another, even reformatting it into non-XML format.

Oracle 9i

Oracle supports XML through its built-in XML Developer Kits (XDK) (which allow developing for XML in Java or C/C++, as well as in PL/SQL), which in the last version — Oracle 9i Release 2 — was replaced by XML DB Repository. Both names refer to a collection of the XML related features and technologies built into RDBMS itself.

Procedures written in PL/SQL access the XDK from inside the RDBMS, whereas Java and/or C/C++ procedures have the ability to access its functionality from outside.

These kits allow you to create and manipulate XML documents via a number of objects including XML/XSLT parsers, XML class generators, XSQL servlets, and so on. To use these packages you must know these languages and the underlying XML/XSLT principles.

XDK is written in Java, and is integrated with Oracle 9i. Owners of earlier versions of Oracle (prior to Oracle 8i Release 8.1.7) may download it from the Oracle Web site. The kit contains an XSLT processor, an XSQL page processor and servlet engine, as well as an XML/SQL utility (a separate, Oracle-specific tool for parsing XML documents into relational tables structure).

Using XSQL, you can set up publishing information in XML format. XSQL includes a page processor, XSQL Java servlet, and a command line interface. The page processor could be called through the servlet (running on Apache Web Server, for instance) or through the command line interface. Essentially, it enables querying relational data and getting results back formatted as XML documents

The XDK will not work with Oracle 8.0 or earlier; to add XML capabilities (somewhat limited) to legacy versions, Oracle introduced the PLSXML package, written in PL/SQL. This package can be called within an SQL or PL/SQL query and will produce XML document output. It requires Oracle Application Server or Oracle 9iAS, to operate.

Within its XML DB Repository, Oracle 9i Release 2 also introduced a new native XML type to store XML documents instead of parsing them into relational rows. It also enables non-XML data to be treated as XML by creating the document from the data extracted from the tables. There are numerous packages provided by Oracle that you could use to build solutions based on XML — DBMS_XDB, DBMS_XMLSCHEMA, to name a few — in addition to XML-related functions and data types. Please refer to Oracle documentation for more information on these features.

IBM DB2 UDB 8.1

DB2 UDB supports XML through XML Extender component. It serves as an XML document repository, as well as a repository for the DTD. DB2 UDB provides two options for the storage — XML column and XML collection. The first option — XML column — enters the entire document into a column of XML data type, while the second option — XML collection — maps the document onto the set of relational tables. IBM provides a number of stored procedures for storing, retrieving, searching, and modifying data in the XML collection.

The DB2 XML Extender includes a visual tool that facilitates mapping of the XML document elements to the columns in the database table. The mapping, once performed, is stored in the database as a document access definition (DAD).

DB2 also provides XML-native data type (XMLVARCHAR and XMLCLOB). Using it you could store the whole XML document in a single column (as opposed to disassembling the document into row-level data for insertion into the "standard" relational tables).

To generate an XML document through SQL qweries against DB2 UDB (or any ODBC-compliant source), one has to use an IBM Net.Data component that defines a collection of macros for XML.

Microsoft SQL Server 2000

SQL Server 2000 provides native support for XML using additional XML-related keywords (built into the Transact-SQL dialect). The XML data is disassembled into text-based tables, and could be assembled back on request. There is no XML data type in the SQL Server 2000.

The requests could be made through usual SQL Server data access channel (ADO, ADO.NET, ODBC,DBLIB — see Chapter 16 for more information about accessing RDBMS), or sent to the SQL Server directly using HTTP (HyperText Transfer Protocol — the Internet standard for transferring text-based data). While exciting, the full description of these features belongs to a Microsoft SQL Server 2000– or XML-specific book. Here we are going to cover just the basics of the XML/SQL convergence.

The new Microsoft extensions for SQL enable you to query SQL Server 2000 tables and receive the data in XML format, as well as to submit an XML document for saving the data into the database. The new FOR XML clause with a standard SELECT query instructs SQL Server 2000 to produce XML output of the data. The clause should be used with additional XML-related keywords to specify the mode: RAW, AUTO, or EXPLICIT; the mode refers to the XML output formatting, and is explained in Microsoft documentation or numerous books on the subject.

Here is an output produced by querying the two tables ORDER_HEADER and ORDER_LINE from the ACME database. The query returns records for two orders (30607 and 30608) and all the order items for them:

SELECT
		  ORDER_HEADER.ORDHDR_ID_N, ORDER_LINE.ORDLINE_ID_N,
		  ORDER_LINE.ORDLINE_CREATEDATE_D FROM ORDER_HEADER,ORDER_LINE WHERE
		  ORDER_HEADER.ORDHDR_ID_N = ORDER_LINE.ORDLINE_ORDHDRID_FN AND
		  ORDER_HEADER.ORDHDR_ID_N IN (30608, 30607) ORDER BY ORDER_HEADER.ORDHDR_ID_N
		  FOR XML AUTO XML_F52E2B61-18A1-11d1-B105-00805F49916B
		  ----------------------------------------------------- <ORDER_HEADER
		  ORDHDR_ID_N="30607"> <ORDER_LINE ORDLINE_ID_N="87234"/> <ORDER_LINE
		  ORDLINE_ID_N="87235"/> </ORDER_HEADER> <ORDER_HEADER
		  ORDHDR_ID_N="30608"> <ORDER_LINE ORDLINE_ID_N="87236"/> <ORDER_LINE
		  ORDLINE_ID_N="87237"/> </ORDER_HEADER>
		  
		  (4 row(s) affected)

Using AUTO mode makes SQL Server 2000 format document according to the best guess it can make, based on the relationships between the tables; note the hierarchical structure of the XML document. Combined with DTD, this document can be sent over the Internet to be displayed or used as input for some application.

To enter the data into an RDBMS from an XML document, Microsoft SQL Server provides the OPENXML keyword. Its use is much more complicated than simply querying for XML. First, you must call a stored procedure sp_xml_preparedocument, which parses the XML document and returns a handle (a memory pointer) to the prepared document in computer memory. This handle is passed to the OPENXML statement, which converts it into RDBMS standard rows, ready to be inserted. A number of parameters have to be specified for the data to be correctly formatted and interpreted.