We use a number of typographic and punctuation conventions to make our grammar easy to understand.
For our grammar, we denote the terminals with a monospaced typeface. The nonterminals appear in italicized text.
We also use a simple naming convention for the majority of our nonterminals: if a nonterminal defines the syntax of a specific tag, its name is the tag name followed by _tag. If a nonterminal defines the various language elements that may be nested within a certain tag, its name is the tag name followed by _content.
For example, if you are wondering exactly which elements are allowed within an <a> tag, you can look for the a_content rule within the grammar. Similarly, to determine the correct syntax of a definition list created with the <dl> tag, look for the dl_tag rule.
Each rule in the grammar starts with the rule's name, followed by the replacement symbol (::=) and the rule's value. We've intentionally kept the grammar simple, but we do use three punctuation elements to denote alternation, repetition, and optional elements in the grammar.
Alternation indicates a rule may actually have several different values, of which you must choose exactly one. Vertical bars (|) separate the alternatives for the rule.
For example, the heading rule is equivalent to any one of six HTML heading tags, so it appears in the table as:
heading ::= h1_tag | h2_tag | h3_tag | h4_tag | h5_tag | h6_tag
The heading rule tells us that wherever the heading nonterminal appears in a rule, you can replace it with exactly one of the actual heading tags.
Repetition indicates that an element within a rule may be repeated some number of times. Repeated elements are enclosed in curly braces ({...}). The closing brace has a subscripted number other than 1 if the element must be repeated a minimum number of times.
For example, the <ul> tag may contain only <li> tags, or it may be empty. The rule, therefore, is:
ul_tag ::= <ul> {li_tag }0 </ul>
The rule says that the syntax of the <ul> tag requires the <ul> tag and zero or more <li> tags, followed by a closing </ul> tag. We spread this rule across several lines and indented some of the elements to make it more readable; your documents need not actually be formatted this way.
Some elements may appear in a document but are not required. Optional elements are enclosed in square brackets ([ . . . ]). The <table> tag, for example, has an optional caption:
table_tag ::= <table> [ caption_tag ] {tr_tag }0 </table>
In addition, the rule says that a table begins with the <table> tag, followed by an optional caption and zero or more table-row tags, and ends with the </table> tag.
Our grammar stops at the tag level; it does not delve further to show the syntax of each tag, including tag attributes. For these details, refer to the quick-reference card included with this book.
The HTML and XHTML standards define a few specific kinds of content that correspond to various types of text. We use these content types throughout the grammar. They are:
Text is interpreted exactly as specified; no character entities or style tags are recognized.
Regular characters in the document character encoding, along with character entities denoted by the ampersand character, are recognized.
Like plain_text, with physical and content-based style tags allowed.