Section 17.2. Cleaning Up After Your HTML Editor

Although you can create and edit HTML/XHTML documents with a text editor, such as vi or Notepad, most HTML authors use an application that is designed for creating web pages ? several are free of charge, many offer a free evaluation period, and most are available for download over the Web. Be forewarned, though; in our experience, you will rarely (if ever) be able to create a web document from one of these editors without having to inspect, add to, edit, and sometimes even repair the source HTML that the editor generates. The following sections discuss a few things that you should know about and watch out for.

17.2.1 Where Did My Document Go?

One of the first things you will notice is that many of the HTML editors automatically introduce into your document markup that you did not explicitly select or write. Remember this very simple HTML document that we started with in Chapter 2?



<title>My first HTML document</title>



<h2>My first HTML document</h2>

Hello, <i>World Wide Web!</i>

 <!-- No "Hello, World" for us -->


Greetings from<br>

<a href="">O'Reilly & Associates</a>


Composed with care by: 

<cite>(insert your name here)</cite>

<br>&copy;2000 and beyond



Here it is what the source looks like after you load it into Microsoft Word 2000:

<html xmlns:v="urn:schemas-microsoft-com:vml"





<meta http-equiv=Content-Type content="text/html; charset=us-ascii">

<meta name=ProgId content=Word.Document>

<meta name=Generator content="Microsoft Word 9">

<meta name=Originator content="Microsoft Word 9">

<link rel=File-List href="./ch01-1_MS_files/filelist.xml">

<title>My first HTML document</title>

<!--[if gte mso 9]><xml>


  <o:Author>William Kennedy</o:Author>

  <o:LastAuthor>William Kennedy</o:LastAuthor>








  <o:Company>ActivMedia Robotics</o:Company>









 /* Style Definitions */

p.MsoNormal, li.MsoNormal, div.MsoNormal






	font-family:"Times New Roman";

	mso-fareast-font-family:"Times New Roman";}



	font-family:"Times New Roman";

	mso-fareast-font-family:"Times New Roman";}

@page Section1

	{size:8.5in 11.0in;

	margin:1.0in 1.25in 1.0in 1.25in;








<!--[if gte mso 9]><xml>

 <o:shapedefaults v:ext="edit" spidmax="1026"/>

</xml><![endif]--><!--[if gte mso 9]><xml>

 <o:shapelayout v:ext="edit">

  <o:idmap v:ext="edit" data="1"/>



<body lang=EN-US link=blue vlink=blue style='tab-interval:.5in'>

<div class=Section1>

<h2>My first HTML document</h2>

<p class=MsoNormal>Hello, <i>World Wide Web</i> </p>

<!-- No "Hello, World" for us -->

<p>Greetings from<br>

<a href="">O'Reilly &amp; Associates</a> </p>

<p>Composed with care by: <cite>(insert your name here)</cite> <br>

&copy;2002 and beyond </p>




Yeow! Where did the document go? Excessive markup makes the source document almost humanly impossible to read. What infuriates document purists like us, beyond the fact that lots of stuff that we neither wanted nor asked for was added, is that Word 2000 automatically treats any text document containing HTML markup as fodder for its mill. You can remove the .html or .htm suffix from the filename or delete <html> and <head> from the document, to no avail ? Word will still get you.

Microsoft isn't alone in cluttering the source. Most HTML editors add at least a <meta> tag that contains their product information. Many go through and "fix" your document to comply with current standards and practices, too ? for example, by adding all those paragraph and list-item end tags that HTML allows you to omit. (From an XHTML standpoint, we admit that this meddling is probably valid.)

To its credit, Word runs well, unlike other tools that routinely crashed without warning as we fought with their treatment of the markup. Microsoft even offers a Word plug-in that removes the additional markup, so that you can recover a reasonable facsimile of the original document.[2]

[2] You can find this plug-in at

17.2.2 When and Why to Edit the Editor

No matter how good the HTML editor is, you'll inevitably have to edit the (albeit cluttered) source it generates. We've had to do it a lot ourselves, and so have all the web developers we've talked with over the last few years.

Not all HTML editors provide an easy means to add JavaScript to your documents, and many are not up-to-date with the HTML/XHTML and CSS2 standards. Remember, too, that the popular browsers don't always agree on how they render a tag, and even different versions of the same browser may differ. Furthermore, even the best HTML editors don't necessarily support extensions to the language.

So into the source you'll have to go, whether to include some HTML feature not yet supported by the editor (such as a new CSS2 property), to insert an attribute value or keyword, or to modify ones that the editor added.

The tip is this: compose first. Try to start with a clean, finished document. Concentrate on content from the outset, and add the special effects later. Use a good HTML editor from the start, or prepare your documents in two steps with two different tools ? a good content editor followed by a good HTML editor ? particularly if you plan to distribute the document in a format other than HTML.

17.2.3 Use the Best

If you compose web pages, we can't imagine you not using an HTML editor of some sort. The convenience is just too compelling. But choose carefully: some HTML editors are abysmal, and you'll spend more time hunting down misplaced tags and errant attributes than you'll spend actually creating the document. Top tip: you get what you pay for.

It's no surprise that HTML editors vary greatly in their features. Many editors let you switch the display from source text to what may appear when rendered by a browser. Some simply let you add tags and modify attribute values through pull-down menus and hot-key options. Others are WYSIWYG layout tools that make it easy to include graphics and other multimedia content. Other advanced features include embedding and testing applets and scripts.

In general, HTML editors fall into one of two categories: either they are good layout tools, including advanced styling features and tools for dynamic content, or they excel at content creation and management. Obviously, if you are producing flashy, commercial web pages that rely on advanced layout techniques and include lots of different styles and dynamic content, use a good layout tool. If you are producing a content-rich document, use a tool that provides good editorial assistance.

No matter which type you use, there are some common considerations to keep in mind when selecting an HTML editor:

Whether it is up-to-date

No HTML editor is yet entirely up-to-date with the current standards, particularly CSS2. Read the product specifications and update often.

Whether it includes a source editor

Although you may load an HTML editor-generated document into a different text editor to change the source, it's much more convenient if the editor itself lets you view and edit the HTML source. Also, make sure that your HTML editor doesn't automatically "fix" your source edits.

Whether it is modifiable

Ideally, the HTML editor should let you customize its behavior to fit your specifications. For example, at minimum you should be allowed to choose your own font colors, styles, and backgrounds, if those are automatically included in the editor's boilerplate document.

Cost and reliability

We can't stress enough that you get what you pay for. If creating web pages is more than just a passing fancy, get the best editor you can find. Don't use or even trust an HTML composition tool just because it came with the browser. Find one that is well supported and well reviewed by other HTML authors. Ask around, and perhaps join an HTML author's newsgroup to get the latest scoop on products.