Every HTML document should conform to the HTML SGML DTD, the formal Document Type Definition that defines the HTML standard. The DTD defines the tags and syntax that are used to create an HTML document. You can inform the browser which DTD your document complies with by placing a special SGML (Standard Generalized Markup Language) command in the first line of the document:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">
This cryptic message indicates that your document is intended to be compliant with the HTML 4.01 final DTD defined by the World Wide Web Consortium (W3C). Other versions of the DTD define more restricted versions of the HTML standard, and not all browsers support all versions of the HTML DTD. In fact, specifying any other doctype may cause the browser to misinterpret your document when displaying it for the user. It's also unclear what doctype to use when including in the HTML document the various tags that are not standards but are very popular features of a popular browser ? the Netscape extensions, for instance, or even the deprecated HTML 3.0 standard, for which a DTD was never released.
Almost no one precedes their HTML documents with the SGML doctype command. Because of the confusion of versions and standards, we don't recommend that you include the prefix with your HTML documents either.
On the other hand, we do strongly recommend that you include the proper doctype statement in your XHTML documents, in conformance with XML standards. Read Chapter 15 and Chapter 16 for more about DTDs and the XML and XHTML standards.
As we saw earlier, the <html> and </html> tags serve to delimit the beginning and end of a document. Since the typical browser can easily infer from the enclosed source that it is an HTML or XHTML document, you don't really need to include the tag in your source HTML document.
That said, it's considered good form to include this tag so that other tools, particularly more mundane text-processing ones, can recognize your document as an HTML document. At the very least, the presence of the beginning and ending <html> tags ensures that the beginning or the end of the document has not inadvertently been deleted. Besides, XHTML requires the <html> tag.
Inside the <html> tag and its end tag are the document's head and body. Within the head, you'll find tags that identify the document and define its place within a document collection. Within the body is the actual document content, defined by tags that determine the layout and appearance of the document text. As you might expect, the document head is contained within a <head> tag and the body is within a <body> tag, both of which are defined later.
The <body> tag may be replaced by a <frameset> tag defining one or more display frames that, in turn, contain actual document content. See Chapter 11 for more information. By far, the most common form of the <html> tag is simply:
<html> document head and body content </html>
When the <html> tag appears without the version attribute, the document server and browser assume the version of HTML used in this document is supplied to the browser by the server.
The dir attribute specifies in which direction the browser should render text within the containing element. When used within the <html> tag, it determines how text will be presented within the entire document. When used within another tag, it controls the text's direction for just the content of that tag.
By default, the value of this tag is ltr, indicating that text is presented to the user left to right. Use the other value, rtl, to display text right to left, for languages like Chinese or Hebrew.Of course, the results depend on your content and the browser's support of HTML 4 or XHTML.Netscape and Internet Explorer Versions 4 and earlier ignore the dir attribute. The HTML 4-compliant Internet Explorer Version 5 simply right-justifies dir=rtl text, although if you look in Figure 3-1, you'll notice the browser moves the punctuation (the period) to the other side of the sentence. Internet Explorer 6 does the same thing. Netscape 6 right-justifies everything, including the ending period.
<html dir=rtl> <head> <title>Display Directions</title> </head> <body> This is how IE 5 renders right-to-left directed text. </body> </html>
When included within the <html> tag, the lang attribute specifies the language you've generally used within the document. When used within other tags, the lang attribute specifies the language you used within that tag's content. Ideally, the browser will use lang to better render the text for the user.
Set the value of the lang attribute to an ISO-639 standard two-character language code. You may also indicate a dialect by following the ISO language code with a dash and a subcode name. For example, "en" is the ISO language code for English; "en-US" is the complete code for U.S. English. Other common language codes include "fr" (French), "de" (German), "it" (Italian), "nl" (Dutch), "el" (Greek), "es" (Spanish), "pt" (Portuguese), "ar" (Arabic), "he" (Hebrew), "ru" (Russian), "zh" (Chinese), "ja" ( Japanese), and "hi" (Hindi).
The version attribute defines the HTML standard version used to compose the document. Its value, for HTML Version 4.01, should read exactly:
version="-//W3C//DTD HTML 4.01//EN"
In general, version information within the <html> tag is more trouble than it is worth, and this attribute has been deprecated in HTML 4. Serious authors should instead use an SGML <!doctype> tag at the beginning of their documents, like this:
<!DOCTYPE HTML PUBLIC "-//W3C/DTD HTML 4.01//EN" "http://www.w3c.org/TR/html4/strict.dtd">