An Overview of XML

A lot of attention has been given to the extensive support for XML that has been added throughout Microsoft Office and Microsoft Word. In this chapter we will discuss what XML is and what Word's support of XML can do for you.


XML features, other than saving documents as XML in the WordML schema format, are available only in Microsoft Office Professional and the standalone version of Microsoft Word.

What XML Is and Does

XML is a text-based language created using the Standard Generalized Markup Language (SGML). XML is like HTML in that it uses a structure of "tags" to identify data elements in a hierarchical manner. Rules are implemented to control the relationships of these tags so that a predictable and consistent structure of information is produced. Because the information structure is consistent and well understood (thanks to widely accepted industry standards), the information can be shared across the world, and everyone can see and understand it.

So what makes XML different from HTML? These are some of the distinguishing features:

  • The rules for creating XML documents are well structured and more rigidly enforced than those for creating HTML documents.

  • XML is designed to describe the data and its structure, whereas HTML describes the presentation of the data. For example, you won't find an XML tag to make text bold. Instead, you'll find tags that specify what information is, what it means, and how it relates to other data.

  • XML establishes extensible rule sets (a set of XML rules is called a schema) that can be used to create a virtually infinite number of markup languages for specialized purposes and environments. HTML is based on a specific and fixed set of rules that cannot be easily altered or extended.

XML's Advantages Over Previous Approaches

Because the rules for creating XML are more rigidly enforced, you can rely on XML to accurately communicate information between computer systems that cannot interpret HTML due to the inconsistencies and errors in the HTML documents. As a result, XML is rapidly becoming the data exchange format of choice for an enormous range of business applications. Word's XML support gives you the ability to build your own XML files and validate them against a schema, integrate that XML into normal documents for users, generate data for use by all of these XML-based applications, and access data those applications already contain.

Having a structure that describes the data instead of its presentation leaves you free to use the same source document for several different purposes without duplicating the information in various places. You simply attach a different style sheet or use a different transformation to provide new instructions for how to format and display each type of data contained in the document.

Duplication of data in different formats has long been a problem in businesses everywhere. Trip reports are entered in one place, read and retyped in another, and then read, summarized, and retyped in yet another. Expense reports are created on the road, and then the figures are retyped into an accounting system by the finance group when submitted. Numerous opportunities exist for errors to creep in, and many hours are spent performing the tasks. XML makes it possible to use the single source of information as originally entered for all these different uses without having to worry about the errors and inconsistencies that can be introduced as the data is reentered and reinterpreted at different locations.

As mentioned earlier, in Chapter 1, "What's New in Microsoft Office Word 2003," consider just a few examples of the many XML dialects that have already been created:

  • XBRL, which standardizes the communication of financial reporting data among corporations

  • MathML, which provides a standard format for mathematical equations

  • WML, which provides a stripped-down markup language for displaying Web applications on wireless phones

  • VoiceXML, which provides a standard language for controlling voice applications such as automated voicemail or call center systems

  • SVG, which defines an efficient format for 2D vector graphics

Finally, the fact that XML is plain text cannot be overlooked as a huge advantage. In the not-too-distant past, developers trying to exchange structured data were forced to contend with mixing and matching options such as these:

  • Complex network and authentication scenarios to allow binary connections

  • Proprietary and/or binary file formats that required multiple conversion steps to use the data

  • Text files containing character-delimited data (that is, tab-delimited flat files) that varied from site to site and had no good way to indicate structure or relationships

With the advent of the World Wide Web and HTML, networks are already configured to allow text-based traffic to easily move in and out with ease.

What Is WordML?

XML dialects (that is, schemas) provide a powerful tool to standardize the "data language" used by applications and organizations to communicate with each other. To support the enhanced XML integration in Word 2003, Microsoft has developed its own dialect for Microsoft Word documents that is appropriately named WordML.

To understand why this is important, think for a moment about what a document represents at a technical level. It's much more than just the contents of the document. Imagine trying to direct another person to create an exact duplicate of a document with your only method of communication being words and text. You would have to consider everything from document properties and settings (author, description, default printer, format, and so on) to content layout and presentation (headers and footers, paragraphs, text attributes, graphics, and so on).

That's exactly what WordML provides. What have in the past been proprietary formatted and binary documents can now be saved and exchanged in a format that is text-based in a structure that is fully and openly documented.

    Part I: Word Basics: Get Productive Fast
    Part II: Building Slicker Documents Faster
    Part III: The Visual Word: Making Documents Look Great
    Part IV: Industrial-Strength Document Production Techniques
    Part VI: The Corporate Word