Section A.1. Grammatical Conventions

We use a number of typographic and punctuation conventions to make our grammar easy to understand.

A.1.1 Typographic and Naming Conventions

For our grammar, we denote the terminals with a monospaced typeface. The nonterminals appear in italicized text.

We also use a simple naming convention for the majority of our nonterminals: if a nonterminal defines the syntax of a specific tag, its name is the tag name followed by _tag. If a nonterminal defines the various language elements that may be nested within a certain tag, its name is the tag name followed by _content.

For example, if you are wondering exactly which elements are allowed within an <a> tag, you can look for the a_content rule within the grammar. Similarly, to determine the correct syntax of a definition list created with the <dl> tag, look for the dl_tag rule.

A.1.2 Punctuation Conventions

Each rule in the grammar starts with the rule's name, followed by the replacement symbol (::=) and the rule's value. We've intentionally kept the grammar simple, but we do use three punctuation elements to denote alternation, repetition, and optional elements in the grammar.

A.1.2.1 Alternation

Alternation indicates a rule may actually have several different values, of which you must choose exactly one. Vertical bars (|) separate the alternatives for the rule.

For example, the heading rule is equivalent to any one of six HTML heading tags, so it appears in the table as:

heading                  ::=      h1_tag

                         |        h2_tag

                         |        h3_tag

                         |        h4_tag

                         |        h5_tag

                         |        h6_tag

The heading rule tells us that wherever the heading nonterminal appears in a rule, you can replace it with exactly one of the actual heading tags.

A.1.2.2 Repetition

Repetition indicates that an element within a rule may be repeated some number of times. Repeated elements are enclosed in curly braces ({...}). The closing brace has a subscripted number other than 1 if the element must be repeated a minimum number of times.

For example, the <ul> tag may contain only <li> tags, or it may be empty. The rule, therefore, is:

ul_tag       ::=     <ul>

                      {li_tag }0


The rule says that the syntax of the <ul> tag requires the <ul> tag and zero or more <li> tags, followed by a closing </ul> tag. We spread this rule across several lines and indented some of the elements to make it more readable; your documents need not actually be formatted this way.

A.1.2.3 Optional elements

Some elements may appear in a document but are not required. Optional elements are enclosed in square brackets ([ . . . ]). The <table> tag, for example, has an optional caption:

table_tag       ::=     <table>

                        [ caption_tag ]

                         {tr_tag }0


In addition, the rule says that a table begins with the <table> tag, followed by an optional caption and zero or more table-row tags, and ends with the </table> tag.

A.1.3 More Details

Our grammar stops at the tag level; it does not delve further to show the syntax of each tag, including tag attributes. For these details, refer to the quick-reference card included with this book.

A.1.4 Predefined Nonterminals

The HTML and XHTML standards define a few specific kinds of content that correspond to various types of text. We use these content types throughout the grammar. They are:


Text is interpreted exactly as specified; no character entities or style tags are recognized.


Regular characters in the document character encoding, along with character entities denoted by the ampersand character, are recognized.


Like plain_text, with physical and content-based style tags allowed.