Section 16.2. Creating XHTML Documents

For the most part, creating an XHTML document is no different from creating an HTML document. Using your favorite text editor, simply add the markup elements to your document's contents in the right order, and display it using your favorite browser. To be strictly correct ("valid," as they say at the W3C), your XHTML document needs a boilerplate declaration up front that specifies the DTD you used to create the document and defines a namespace for the document.

16.2.1 Declaring Document Types

For an XHTML browser to correctly parse and display your XHTML document, you should tell it which version of XML is being used to create the document. You must also state which XHTML DTD defines the elements in your document.

The XML version declaration uses a special XML processing directive. In general, these XML directives begin with <? and end with ?>, but otherwise they look like typical tags in your document.[3] To declare that you are using XML Version 1.0, place this directive in the first line in your document:

[3] <! was already taken.

<?xml version="1.0" encoding="UTF-8"?>

This tells the browser that you are using XML 1.0 along with the 8-bit Unicode character set, the one most commonly used today. The encoding attribute's value should reflect your local character set. Refer to the appropriate ISO standards for other encoding names.

Once you've gotten the important issue of the XML version squared away, you should then declare the markup language's DTD:

<!DOCTYPE html 

     PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"

     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

With this statement, you declare that your document's root element is html, as defined in the DTD whose public identifier is defined as "-//W3C//DTD XHTML 1.0 Strict//EN". The browser may know how to find the DTD matching this public identifier. If it does not, it can use the URL following the public identifier as an alternative location for the DTD.

As you may have noticed, the above <!DOCTYPE> directive told the browser to use the strict XHTML DTD. Here's the one you'll probably use for your transitional XHTML documents:

<!DOCTYPE html 

     PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

And, as you might expect, the <!DOCTYPE> directive for the frame-based XHTML DTD is:

<!DOCTYPE html 

     PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"

     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">

16.2.2 Understanding Namespaces

As described in the last chapter, an XML DTD defines any number of element and attribute names as part of the markup language. These elements and attributes are stored in a namespace that is unique to the DTD. As you reference elements and attributes in your document, the browser looks them up in the namespace to find out how they should be used.

For instance, the <a> tag's name ("a") and attributes (e.g., "href " and "style") are defined in the XHTML DTD, and their names are placed in the DTD's namespace. Any "processing agent" ? usually a browser, but your eyes and brain can serve the same function ? can look up the name in the appropriate DTD to figure out what the markup means and what it should do.

With XML, your document actually can use more than one DTD and therefore require more than one namespace. For example, you might create a transitional XHTML document but also include special markup for some math expressions according to an XML math language. What happens when both the XHTML DTD and the math DTD use the same name to define different elements, such as <a> for XHTML hypertext and <a> for an absolute value in math? How does the browser choose which namespace to use?

The answer is the xmlns[4] attribute. Use it to define one or more alternative namespaces within your document. It can be placed within the start tag of any element within your document, and its URL-like[5] value defines the namespace that the browser should use for all content within that element.

[4] XML namespace ? xmlns ? get it? This is why XML doesn't let you begin any element or attribute with the three-letter prefix "xml": it's reserved for special XML attributes and elements.

[5] It looks like a URL, and you might think that it references a document that contains the namespace, but alas, it doesn't. It is simply a unique name that identifies the namespace. Display agents use that placeholder to refer to their own resources for how to treat the named element or attribute.

With XHTML, according to XML conventions, you should at the very least include within your document's <html> tag an xmlns attribute that identifies the primary namespace used throughout the document:

<html xmlns="http://www.w3.org/TR/xhtml1">

If and when you need to include math markup, use the xmlns attribute again to define the math namespace. So, for instance, you could use the xmlns attribute within some math-specific tag of your otherwise common XHTML document (assuming the MATH element exists, of course):

<div xmlns="http://www.w3.org/1998/Math/MathML>x2/x</div">

In this case, the XML-compliant browser would use the http://www.w3.org/1998/Math/MathML namespace to divine that this is the MATH, not the XHTML, version of the <div> tag, and should therefore be displayed as a division equation.

It would quickly become tedious if you had to embed the xmlns attribute into each and every <div> tag any time you wanted to show a division equation in your document. A better way ? particularly if you plan to apply it to many different elements in your document ? is to identify and label the namespace at the beginning of your document, and then refer to it by that label as a prefix to the affected element in your document. For example:

<html xmlns="http://www.w3.org/TR/xhtml1" 

      xmlns:math="http://www.w3.org/1998/Math/MathML">

The math namespace can now be abbreviated to "math" later in your document. So the streamlined:

<math:div>x2/x</div>

now has the same effect as the lengthy earlier example of the math <div> tag containing its own xmlns attribute.

The vast majority of XHTML authors will never need to define multiple namespaces and so will never have to use fully qualified names containing the namespace prefix. Even so, you should understand that multiple namespaces exist and that you will need to manage them if you choose to embed content based on one DTD within content defined by another DTD.

16.2.3 A Minimal XHTML Document

As a courtesy to all fledgling XHTML authors, we now present the minimal and correct XHTML document, including all the appropriate XML, XHTML, and namespace declarations. With this most difficult part out of the way, you need only supply content to create a complete XHTML document.

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE html 

    PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/TR/xhtml1" xml:lang="en" lang="en">

  <head>

    <title>Every document must have a title</title>

  </head>

  <body>

    ...your content goes here...

  </body>

</html>

Working through the minimal document one element at a time, we begin by declaring that we are basing the document on the XML 1.0 standard and using 8-bit Unicode characters to express its contents and markup. We then announce, in the familiar HTML-like <!DOCTYPE> statement, that we are following the markup rules defined in the transitional XHTML 1.0 DTD, which allow us free rein to use nearly any HTML 4.01 element in our document.

Our document content actually begins with the <html> tag, which has its xmlns attribute declare that the XHTML namespace is the default namespace for the entire document. Also note the lang attribute, in both the XML and XHTML namespaces, which declares that the document language is English.

Finally, we include the familiar document <head> and <body> tags, along with the required <title> tag.