The past few years have seen a dramatic increase in the popularity and adoption of XML, the eXtensible Markup Language. This explosive growth is driven by its ability to provide a standardized, extensible means of including semantic information within documents describing semi-structured data. This makes it possible to address the shortcomings of existing markup languages such as HTML and support data exchange in e-business environments.
Consider, for instance, the simple HTML document in Listing P.1. The data contained in the document is intertwined with information about its presentation. In fact, the tags describe only how the data is to be formatted. There is no semantic information that the data represents a person's name and address. Consequently, an interpreter cannot make any sound judgments about the semantics as the tags could as well have enclosed information about a car and its parts. Systems such as WIRE (Aggarwal et al. 1998) can interpret the information by using search templates based on the structure of HTML files and the importance of information enclosed in tags defining headings and so forth. However, such interpretation lacks soundness, and its accuracy is context dependent.
<html> <head> <title>Person Information</title> </head> <body> <p> <b>Name: </b>John Doe</p> <p> <b>Address: </b>10 Church Street, Lancaster LAX 2YZ, UK</p> </body> </html>
Dynamic Web pages, where the data resides in a backend database and is served using predefined templates, reduce the coupling between the data and its representation. However, the semantics of the data can still be confusing when exchanging information in an e-business environment. A particular item could be represented using different names (in the simplest case) in two systems in a business-to-business transaction. This enforces adherence to complex, often proprietary, document standards.
XML provides inherent support for addressing the above problems, as the data in an XML document is self-describing. However, the increasing adoption of XML has also raised new challenges. One of the key issues is the management of large collections of XML documents. There is a need for tools and techniques for effective storage, retrieval, and manipulation of XML data. The aim of this book is to discuss the state-of-the-art in such tools and techniques.
This preface introduces the basics of XML and some related technologies before moving on to providing an overview of issues relating to XML data management and approaches addressing these issues. Only an overview of XML and related technologies is provided because several other sources cover these concepts in depth.