Section 3.5. Document Content

Nearly everything else you put into your HTML or XHTML document that isn't a tag is by definition content, and the majority of that is text. Like tags, document content is encoded using a specific character set ? by default, the ISO-8859-1 Latin character set. This character set is a superset of conventional ASCII, adding the necessary characters to support the Western European languages. If your keyboard does not allow you to directly enter the characters you need, you can use character entities to insert the desired characters.

3.5.1 Advice Versus Control

Perhaps the hardest rule to remember when marking up an HTML or XHTML document is that all the tags you insert regarding text display and formatting are only advice for the browser: they do not explicitly control how the browser will display the document. In fact, the browser can choose to ignore all of your tags and do what it pleases with the document content. What's worse, the user (of all people!) has control over the text-display characteristics of his or her own browser.

Get used to this lack of control. The best way to use markup to control the appearance of your documents is to concentrate on the content of the document, not on its final appearance. If you find yourself worrying excessively about spacing, alignment, text breaks, and character positioning, you'll surely end up with ulcers. You will have gone beyond the intent of HTML. If you focus on delivering information to users in an attractive manner, using the tags to advise the browser as to how best to display that information, you are using HTML or XHTML effectively, and your documents will render well on a wide range of browsers.

3.5.2 Character Entities

Besides common text, HTML and XHTML give you a way to display special text characters that you might not normally be able to include in your source document or that have other purposes. A good example is the less-than or opening bracket symbol (<). In HTML, it normally signifies the start of a tag, so if you insert it simply as part of your text, the browser will get confused and probably misinterpret your document.

For both HTML and XHTML, the ampersand character (&) instructs the browser to use a special character, formally known as a character entity. For example, the command &lt; inserts that pesky less-than symbol into the rendered text. Similarly, &gt; inserts the greater-than symbol, and &amp; inserts an ampersand. There can be no spaces between the ampersand, the entity name, and the required, trailing semicolon. (Semicolons aren't special characters; you don't need to use an ampersand sequence to display a semicolon normally.) [Section 16.3.7]

You also may replace the entity name after the ampersand with a pound symbol (#) and a decimal value corresponding to the entity's position in the character set. Hence, the sequence &#60; does the same thing as &lt; and represents the less-than symbol. In fact, you could substitute all the normal characters within an HTML document with ampersand special characters, such as &#65; for a capital "A" or &#97; for its lowercase version, but that would be silly. A complete listing of all characters and their names and numerical equivalents can be found in Appendix F.

Keep in mind that not all special characters can be rendered by all browsers. Some browsers just ignore many of the special characters; with others, the characters aren't available in the character sets on a specific platform. Be sure to test your documents on a range of browsers before electing to use some of the more obscure character entities.


Comments are another type of textual content that appears in the source HTML document but is not rendered by the user's browser. Comments fall between the special <!-- and --> markup elements. Browsers ignore the text between the comment character sequences. Here are some sample comments:

<!-- This is a comment -->

<!-- This is a 

multiple-line comment

that ends on this line -->

There must be a space after the initial <!-- and preceding the final -->, but otherwise you can put nearly anything inside the comment. The biggest exception to this rule is that the HTML standard doesn't let you nest comments.[3]

[3] Early versions of Netscape did let you nest comments, but no longer. The practice is tricky, so just say no.

Internet Explorer also lets you place comments within a special <comment> tag. Everything between the <comment> and </comment> tags is ignored by Internet Explorer. All other browsers display the comment to the user. Obviously, because of this undesirable behavior, we do not recommend using the <comment> tag. Instead, always use the <!-- and --> sequences to delimit comments.

Besides the obvious use of comments for source documentation, many web servers use comments to take advantage of features specific to the document server software. These servers scan the document for specific character sequences within conventional HTML/XHTML comments and then perform some action based upon the commands embedded in the comments. The action might be as simple as including text from another file (known as a server-side include) or as complex as executing other commands on the server to generate the document contents dynamically.