Section 15.2. Documents and DTDs

To be perfectly correct, we must explain that "XML" has come to mean many subtly different things. An "XML document" is a document containing content that conforms to a markup language defined from the XML standard. An "XML Document Type Definition" (XML DTD) is a set of rules ? more formally known as "entity and element declarations" ? that define an XML markup language; i.e., how the tags are arranged in a correct ("valid") XML document. To make things even more confusing, entity and element declarations may appear in an XML document itself, as well as within an XML DTD.

An XML document contains character data, which consists of plain content and markup in the form of tags and XML declarations. Thus:


is a line in a well-formed XML document. Well-formed XML documents follow certain rules, such as the requirement for every tag to have a closing tag. These rules are presented in the context of XHTML in Chapter 16.

To be considered valid -- a valid XML document conforms to a DTD ? every XML document must have a corresponding set of XML declarations that define how the tags and content should be arranged within it. These declarations may be included directly in the XML document, or they may be stored separately in an XML DTD. If an XML DTD exists that defines the <blah> tag, our well-formed XML document is valid, provided you preface it with a <!DOCTYPE> tag that explains where to find the appropriate DTD:

<?xml version="1.0"?>

<!DOCTYPE blah SYSTEM "blah.dtd">


The example document begins with the optional <?xml> directive declaring the version of XML it uses. It then uses the <!DOCTYPE> directive to identify the DTD to be used to process the content of the document. In this case, a DTD named blah.dtd should be accessible to the browser[4] so the browser can determine whether the <blah> tag is valid within the document.

[4] We use "browser" here because that's what most people will use to process and view XML documents. The XML specification uses the more generic phrase "processing application," since in some cases the XML document will be processed not by a traditional browser but by some other tool that knows how to interpret XML documents.

XML DTDs contain only XML entity and element declarations. XML documents, on the other hand, may contain both XML element declarations and conventional content that uses those elements to create a document. This intermingling of content and declarations is perfectly acceptable to a computer processing an XML document, but it can get confusing for humans trying to learn about XML. For this reason, we focus our attention in this chapter on the XML entity and element declaration features that you can use to define new tags and document types. In other words, we are addressing only the DTD features of XML; the content features mirror the rules and requirements you already know and use in order to create HTML documents.