Section 1.3. HTML and XHTML: What They Are

HTML and XHTML are document-layout and hyperlink-specification languages. They define the syntax and placement of special, embedded directions that aren't displayed by the browser but tell it how to display the contents of the document, including text, images, and other support media. The languages also tell you how to make a document interactive through special hypertext links, which connect your document with other documents ? on either your computer or someone else's ? as well as with other Internet resources.

You've certainly heard of HTML, and perhaps XHTML too, but did you know that they are just two of many other markup languages? Indeed, HTML is the black sheep in the family of document markup languages. HTML was based on SGML, the Standard Generalized Markup Language. The powers-that-be created SGML with the intent that it be the one and only markup metalanguage from which all other document markup elements would be created. Everything from hieroglyphics to HTML can be defined using SGML, negating any need for any other markup language.

The problem with SGML is that it is so broad and all-encompassing that mere mortals cannot use it. Using SGML effectively requires very expensive and complex tools that are completely beyond the scope of regular people who just want to bang out an HTML document in their spare time. As a result, HTML adheres to some, but not all, SGML standards,[3] eliminating many of the more esoteric features so that it is readily useable and used.

[3] The HTML DTD in Appendix D uses a subset of SGML to define the HTML 4.01 standard.

Besides the fact that SGML is unwieldy and not well suited to describing the very popular HTML in a useful way, there was also a growing need to define other HTML-like markup languages to handle different network documents. Accordingly, the W3C defined the Extensible Markup Language (XML). Like SGML, XML is a separate formal markup metalanguage that uses select features of SGML to define markup languages. It eliminates many features of SGML that aren't applicable to languages like HTML and simplifies other SGML elements in order to make them easier to use and understand.

However, HTML Version 4.01 is not XML-compliant. Hence, the W3C offers XHTML, a reformulation of HTML that is compliant with XML. XHTML attempts to support every last nit and feature of HTML 4.01 using the more rigid rules of XML. It generally succeeds, but it has enough differences to make life difficult for the standards-conscious HTML author.