You want to use HttpUnit to test your web application, but it can't parse your HTML.
Write well-formed HTML, ensuring the tags are properly nested and attributes are quoted. For best results, write XHTML.
Web browsers like Internet Explorer and Mozilla are extremely tolerant of bad HTML. Since so many web pages on the Internet are written using sloppy HTML, browsers have to compensate for all sorts of HTML markup errors. While browsers may be able to display sloppy HTML reasonably well, writing unit tests against HTML requires more precision.
Writing clean HTML serves two purposes. First, it makes your HTML more portable to a wide variety of browsers. Second, it makes your pages easier to test with tools like HttpUnit.
HttpUnit uses the HTML Tidy library to parse through your HTML, searching for hyperlinks, tables, form elements, and other objects. Although HTML Tidy attempts to parse poorly-formed HTML, it can only do so with limited success. In order to make your HTML more easily parsed, you should nest your tags properly and follow the HTML standard. You should not encounter any problems if you edit your web pages using a validating XML parser and adhere to one of the XHTML DTDs.
If you are encountering problems, call this method to turn on HTML Tidy warnings:
This causes HTML Tidy to print warning messages to the console, letting you know which lines of your HTML pages are incorrect.
While printing error messages may help you diagnose problems, it is not automatic. Remember that creating automated tests is a key goal of XP. If XHTML compliance is your goal, write a unit test for each web page that passes your XHTML through an XML parser. The test will fail whenever a page is not valid XHTML. This becomes part of your normal testing suite and fails immediately as soon as someone introduces sloppy HTML.
See the HTML and XHTML specifications at the Worldwide Web Consortium http://www.w3.org. Also see O'Reilly's HTML & XHTML: The Definitive Guide by Chuck Musciano and Bill Kennedy. Chapter 11 briefly mentions XMLUnit, which can simplify validating against the XHTML DTDs.