P.1 What Is XML?

  Previous section   Next section

XML is a W3C standard for document markup. It makes it possible to define custom tags describing the data enclosed by them. An example XML document containing data about a person is shown in Listing P.2. Note that tags in XML can have attributes. However, for simplicity, they have not been used in this example.

Listing P.2 An XML Document with Data about a Person
<?xml version="1.0" standalone="yes"?>
<person>
  <name>
     <surname>Doe</surname>
     <firstname>John</firstname>
  </name>
  <address>
     <housenumber>10</housenumber>
     <street>Church Street</street>
     <town>Lancaster</town>
     <postcode>LAX 2YZ</postcode>
     <country>UK</country>
  </address>
</person>

Unlike the HTML document in Listing P.1, the document in Listing P.2 contains only the data about the person and no representational information. The data and its meaning can be read from the document and the document formatted in a range of fashions as desired. One standard approach is to use XSL, the eXtensible Stylesheet Language.

The flexible nature of XML makes it an ideal basis for defining arbitrary languages. One such example is WML, the Wireless Markup Language. Similarly, the XML schema language used to describe the structure of XML documents is based on XML itself.

P.1.1 Well-Formed and Valid XML

Although XML syntax is flexible, it is constrained by a grammar that governs the permitted tag names, attachment of attributes to tags, and so on. All XML documents must conform to these basic grammar rules. Such conformant documents are said to be well formed and can be interpreted by an XML interpreter, which means it's not necessary to write an interpreter for each XML document instance.

In addition to being well formed, the structure of a particular XML document can be validated against a Document Type Definition (DTD) or an XML schema. An XML document conforming to a given DTD or schema is said to be valid.

P.1.2 Data-Centric and Document-Centric XML

XML documents can be classified on the basis of data they contain. Data-centric documents capture structured data such as that pertaining to a product catalog, an order, or an invoice. Document-centric documents, on the other hand, capture unstructured data as in articles, books, or e-mails. Of course, the two types can be combined to form hybrid documents that are both data-centric and document-centric. Listings P.3 and P.4 provide examples of data-centric and document-centric XML, respectively.

Listing P.3 Data-Centric XML
<order>
  <customer>Doe</customer>
  <position>
    <isbn>1-234-56789-0</isbn>
    <number>2</number>
    <price currency="UKP">30.00</price>
  </position>
</order>
Listing P.4 Document-Centric XML
<content>
  XML builds on the principles of two
  existing languages, <em>HTML</em>
  and <em>SGML</em> to create a simple
  mechanism  . . .
  The generalized markup concept . . .
</content>

Top

Part IV: Applications of XML