HTML, as everyone should know by now, began as a simple markup language similar in appearance and usage to other SGML-based markup languages. In its early years, little effort was put into making HTML perfectly SGML-compliant. As a result, odd features and a lax attitude toward enforcing the rules became standard parts of both HTML and the browsers that processed HTML documents.
As the Web grew from an experiment into an industry, the desire for a standard version of HTML led to the creation of several official versions, culminating most recently with Version 4.01. As HTML has stabilized into this latest version, browsers have become more alike in their support of various HTML features. In general, the world of HTML has settled into a familiar set of constructs and usage rules.
Unfortunately, HTML offers only a limited set of document-creation primitives, is incapable of handling nontraditional content such as chemical formulae, musical notation, or mathematical expressions, and fails to adequately support alternative display media such as handheld computers or intelligent cellular phones. We need new ways to deliver information that can be parsed, processed, displayed, sliced, and diced by the many different communication technologies that have emerged since the Web sparked the digital communication revolution a decade ago.
Rather than trying to rein in another herd of maverick, nonstandard markup languages, the W3C introduced XML as a standard way to create new markup languages. XML is the framework upon which organizations can develop their own markup languages to suit the needs of their users. XML is an updated version of SGML, streamlined and enhanced for today's dynamic systems. And while the W3C originally intended it as a tool to create document markup languages, XML is also becoming quite useful as a standard way to define small languages that different applications use as data-exchange protocols.
Of course, we don't want to abandon the plethora of documents already marked up with HTML, or the infrastructure of knowledge, tools, and technologies that currently support HTML and the Web. Yet, we do not want to miss the opportunities of XML, either. XHTML is the bridge. It uses the features of XML to define a markup language that is nearly identical to standard HTML 4.01 and gets us all started down the XML road.
HTML 4.01 comes in three variants, each defined by a separate SGML DTD. XHTML also comes in three variants, with XML DTDs corresponding to the three SGML DTDs that define HTML 4.01. To create an XHTML document, you must choose one of these DTDs and then create a document that uses that DTD's elements and rules.
The first XHTML DTD corresponds to the "strict" HTML DTD. The strict definition excludes all deprecated elements (tags and attributes) in HTML 4.01 and forces authors to use only those features that are fully supported in HTML. Many of the HTML elements and attributes dealing with presentation and appearance, such as the <font> tag and the align attribute, are missing from the strict XHTML DTD and have been replaced by the equivalent properties in the CSS model.
Most HTML authors find the strict XHTML DTD too restrictive, since many of the deprecated elements and attributes are still in widespread use throughout the Web. More importantly, the popular browsers ? while fully supporting the deprecated elements ? have yet to fully implement the new standard ones. The only real advantage of using the strict XHTML DTD is that compliant documents are guaranteed to be fully supported in future versions of XHTML.[2]
[2] If the W3C has its way, HTML won't change beyond Version 4.01. No more HTML; all new developments will be in XHTML and many other XML-based languages.
Most authors will probably choose to use the "transitional" XHTML DTD. It's closest to the current HTML standard and includes all those wonderful, but deprecated, features that make life as an HTML author easier. With the transitional XHTML DTD, you can ease into the XML family while staying current with the browser industry.
The third DTD is for frames. It is identical to the transitional DTD in all other respects; the only difference is the replacement of the document body with appropriate frame elements. You might think that, for completeness's sake, there would be strict and transitional frame DTDs, but the W3C decided that if you use frames, you might as well use all the deprecated elements as well.