Upgrading to the XHTML Document Type

The goal of this lesson is to upgrade the site's HTML code to XHTML, and ensure that all subsequent code you add while working in Dreamweaver is XHTML-compliant. You might be wondering what exactly XHTML is and how it differs from HTML. XHTML is the current standard for HTML, which means that the relationship between the two is historical: XHTML replaces HTML.

Perhaps the most significant change to come with XHTML has little to do with code at all; it's the new conceptual thrust of XHTML, bringing HTML in line with XML, or eXtensible Markup Language. XML is a meta-languagea set of rules that developers can use to develop their own custom language in line with a common standard. XML is markup-based, like HTML, so its syntax should be familiar, as in the following: <name type="first">Jeffrey</name>. Several variants of XML have already appeared, such as MathML, a markup language that mathematicians use to encode mathematical expressions. XHTML is a variant that developers use for (drum roll) marking up Web pages.

One of the central tenets of XML is that the tags describe the content of a document, but not its presentation. Presentation of XML content is handled with a separate type of code (such as CSS, XSLT, or XSL-FO). Previous versions of HTML mixed content and presentation markup. Elements such as <font color="green" size="+1"> lack semantic value. For this reason, they are deprecated, which means that they are discouraged and will be dropped from the standard, but they'll still work for now. They tell the user (or browser) nothing about what was enclosed inside them. Rather, these tags merely tell the browser how to present whatever is enclosed, unlike the <name> element in the preceding paragraph, which leaves little to the imagination about what it contains.

In short, you should use XHTML to describe the structure of your document: headings (<h1>, <h2>, etc.), lists (<ol>, <ul>, <li>), body text (<p>), emphatic text (<strong>, <em>), anchors (<a>) and so on. To specify how graphical browsers (such as Internet Explorer and Netscape) should present this information, you should use cascading style sheets, rather than presentation tags, such as <font> or <b>.

This separation of code from content and presentation has more than just theoretical benefit. First, it enables a broader variety of browsers, including screen readers for the visually impaired, to render the content without having to weed out (or worse, attempt to interpret) presentation tags. Second, the proper use of XHTML and cascading style sheets greatly speeds up the development and maintenance of Web sites.

As you probably know, XHTML looks a bit like HTML. Many of the tags are the same, including <body>, <head>, <h1>, <p>, <ol>, <a>, <table>, <tr>, <td>, <form>, and so on. In fact, most HTML code is unchanged in the transition. That limits how much you actually have to change when upgrading to XHTML.

NOTE

XHTML is backward-compatible. That is, browsers created before the XHTML specification can still display XHTML code nearly perfectly.


But XHTML code is not exactly the same as HTML code. Your task in this lesson is to find these differences and change the code accordingly. The most significant differences, beyond the enforced separation of logic and presentation already discussed, are as follows:

  • All XHTML tags and their attributes must be lowercase. In HTML, both <p> and <P> are equally acceptable, and many developers capitalized HTML tags to help them stand out. But in XHTML, following XML rules, all tags must be lowercase, so only <p> is acceptable. Likewise, tag attributes, such as the cellpadding="3" attribute of the <table> tag, must also be in lowercase. The Newland Tours site already uses lowercase tags, so you won't have to worry too much about this issue.

  • All XHTML tags must be closed. For example, if you have an <h1> tag, somewhere else there should be a closing </h1> tag. However, some elements lack closing tags. Examples of these empty tags include <br>, <img src="xyz.gif">, <hr>, and <input>. In addition, some tags could be either closed or empty in HTML, including the <p> and <li> tags. In XHTML, however, the <p> and <li> tags need corresponding </p> and </li> tags. As for the empty tags, they are closed in a special way. The syntax is <my_empty_tag />. Thus, you should convert the empty tags above to <br />, <img src="xyz.gif" />, <hr />, and <input />. The added space and forward slash replace the closing tag.

  • Because so many different flavors of HTML exist side-by-side on the Web, developers have for years preceded HTML documents with a document type declaration. For example, Dreamweaver adds the following to the top of most new documents: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">. This tells browsers which version of code (HTML 4.01 Transitional) the document uses as well as its language (English). XHTML not only has a different document type statement, but as a valid form of XML, it also has an XML declaration.

  • As discussed previously, presentation tags are no longer allowed. Instead, use cascading style sheets to handle presentation.

With that background, let's get started!

  1. Open index.htm in Dreamweaver.

    The Find and Replace operations you'll be doing are sitewide, so theoretically it makes no difference which file you open. In fact, you can have a blank, unsaved document open. Dreamweaver doesn't care. As long as you have a file open, you can access the Find and Replace dialog.

    In a moment, you'll use Find and Replace to formally convert the site to XHTML. To do so, you will replace the document type definition from HTML 4.01 Transitional to XHTML 1.0 Transitional. Your document won't be fully XHTML-compliant, since all of the noncompliant code will still be there. But by changing the document type information, you'll not only tell browsers that the document is XHTML, you'll also tell Dreamweaver. Once you do so, Dreamweaver will automatically write XHTML-compliant code from that point forward, as you'll see for yourself.

  2. Still in split view, place your cursor at the end of the Featured Vacation segment in the design pane (after $899/child USD) and press Shift+Return (Macintosh) or Shift+Enter (Windows).

    This keyboard shortcut inserts a line break element (the <br> tag in HTML). This is an empty element, and as you can see, it is not inserted in the correct XHTML format. This is proof that Dreamweaver is writing, by default, non-XHTML compliant code. The reason Dreamweaver does this is that the document type is HTML 4.01 Transitional, and in that version of HTML, <br> is the correct way to code a line break element.

    graphics/02fig02.jpg

    Let's change the document type information.

  3. Choose File > New to create a new document. Make sure Basic Page is the selected category and that HTML is the file selected in the second pane. Near the bottom-right corner, check Make Document XHTML Compliant. Press Create.

    When you create a new document that is XHTML-compliant, Dreamweaver writes the proper document type information at the top of the new document. We'll copy that code and use it to replace the existing code in the HTML 4.01 Transitional site.

    graphics/02fig03.gif

  4. Select lines 1-3 of the new document, and choose Edit > Copy to copy the code to the clipboard. You can close the new file without saving.

    Let's take a look at the code you just copied.

    <?xml version="1.0" encoding="iso-8859-1"?>
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml">
    

    The first line declares that the document is an XML document. Remember, XHTML is a valid form of XML. All XML documents begin with such a declaration. The encoding attribute specifies the character set used in the document. If your document is in English, then you don't need to change this. If you are creating Web pages that display Japanese or Greek text, then you would need to change this attribute accordingly.

    The second line is the DOCTYPE declaration, which looks much like the one for HTML 4.01, except that the HTML version has changed. One other difference is a URL that points to a DTD file. A DTD, or document type definition, is a document containing all the rules specifying the tags and attributes allowed in a particular version of XML.

    The third line is the opening <html> tag that all HTML documents must have. This one is special, in that it has an xmlns attribute. This attribute, short for XML Name Space, specifies the source of all the tags. It is required in XML because it is conceivable that two different XML-based languages will use the same tag. By specifying a default name space, the rendering program (in this case, the browser) can resolve any such conflicts.

    graphics/02inf01.gif

  5. Choose Edit > Find and Replace.

    The Find and Replace dialog appears. The factory default settings are shown in the accompanying screenshot, but yours may vary, depending on the settings from any previous use. If either the Replace With area or the area above it already have any text in them, delete it. Both fields should be empty before you proceed to the next step.

    graphics/02fig04.gif

  6. Click inside the Replace With text area and press Command+V (Macintosh) or Ctrl+V (Windows) to paste in the three lines of code.

    The code may wrap inside the dialog so it appears to be more than three lines, but don't worry about that.

    What you're doing in this step is telling Dreamweaver what to replace the searched text with. Of course, you haven't yet told Dreamweaver what to search for in the first place. You also haven't told Dreamweaver which files you want it to search.

  7. index.htm should still be open. Scroll to the top of code view, select lines 1-2, choose Edit > Copy, click in the text area above the Replace With text area, and paste in these two lines of code.

    Again, the lines of code may wrap, but that is not a concern.

    At this point, you've told Dreamweaver what to find and what to replace it with. So far, so good. But you still haven't told Dreamweaver in which files it should search for the strings.

    graphics/02fig05.gif

  8. In the Find In drop-down menu (at the top), select Entire Current Local Site.

    Here you are telling Dreamweaver to look for the string in every HTML file in the site. This means that rather than upgrading one page at a time to XHTML, you can update every file at once.

    NOTE

    Making multiple-file replacements is potentially dangerous, because changes made to site files that are currently closed (which is all files besides index.htm) are permanent and not Undo-able. Be careful when running Find and Replace operations on multiple files.

  9. In the Search For drop-down menu, choose Source Code. Uncheck Match Case, check Ignore Whitespace Differences, and uncheck Use Regular Expressions.

    The Search For drop-down menu is important. By default, Find and Replace looks in the text, which is the text that will be displayed in a browser for the user to see. The Text option does not include the code. Since you are upgrading the code of the pages, and not the text, it is vital that you choose Source Code.

    Match Case takes case (a versus A) into account in the search. Case differences are ignored if the box is unchecked.

    Ignore Whitespace Differences ignores any white space, such as hard returns, tabs, and indentation between text or elements. Because HTML ignores white space, many programmers use white space to make code more legible.

    Use Regular Expressions causes Dreamweaver to interpret reserved characters used in regular expressions (such as /d) as regular expression characters. If unchecked, and Dreamweaver encounters /d, it will search for /d, rather than any single numeral, which is what /d means in regular expressions.

    The final version of the Find and Replace dialog should appear as in the accompanying screenshot.

    graphics/02fig04a.gif

  10. Click Replace All. When the warning dialog appears, click Yes.

    The operation is run. By default, the Dreamweaver Results panel opens to show you which files were changed. As you can see, five files were changed. Since there are five files in the site, you know you were successful.

    graphics/02fig06.gif

  11. To wrap up the line break experiment, return to the bottom of the Featured Vacation section where you inserted a line break earlier, and press Shift+Return (Macintosh) or Shift+Enter (Windows) again.

    When you look in code view, you will see that Dreamweaver this time added a <br /> tag beside the <br> tag it added earlier. Remember, the <br /> syntax is used for empty elements; it is equivalent to <br></br>, and is not simply a closing </br> tag. This proves that Dreamweaver knows that the document is an XHTML document, and you can be assured that henceforth Dreamweaver will not add any non-XHTML-compliant tags to your code.

    NOTE

    Dreamweaver may still add <font> tags, depending on how you format text. It's best to discipline yourself not to format text using the Property inspector for such attributes as color and size. Instead, rely on CSS as much as possible.

    graphics/02inf02.gif

    Changing the document type information at the top ensures that new tags are XHTML-compliant. Of course, it does nothing about the existing tags. You'll have to fix those yourself.

  12. Remove the two line breaks, if you like, and save the file.

    Any time you make a significant change, you should save index.htm. You don't need to save any of the closed filesas soon as Dreamweaver replaced the document type information, it saved those files.