4.6 Schemas Compared

Each of the schemas we've looked at has compelling features and significant flaws. Some of the important points are listed Table 4-2.

Table 4-2. A comparison of schema

Feature

DTD

W3C Schema

RELAX NG

Schematron

XML syntax

No

Yes

Yes

Yes

Namespace compatible

No

Yes

Yes

Yes

Declares entities

Yes

No

No

No

Tests datatypes

No

Yes

Yes

Yes

Default attribute values

Yes

Yes

No[7]

No

Notations

Yes

Yes

No

No

Unordered content

No

Yes

Yes

Yes

Modular

Yes

Yes

Yes

No

Element-attribute interchangeability

No

Yes

Yes

Yes

Specifies how to associate with a document

Yes

Yes

No

No

[7] Added later in the RELAX NG DTD Compatibility specification.

DTDs have been around the longest, so as you would expect they have the widest support in literature and software. They also have the advantage of being the only way to declare entities at the moment. The syntax for DTDs is very easy to learn, although its readability often leaves much to be desired. Try reading the DocBook-XML DTD sometime and you'll see what I mean. After a fashion, it is modular, but I find the parameter entities are often a nuisance, especially when you want to override imported declarations.

W3C XML Schema has the advantage of being blessed by the W3C, so you can be sure it will win many converts. Software support is growing quickly. I think it's pretty decent, but it has a clunkiness to it that can make schema design a chore. The datatypes will become a de facto standard, as they are already borrowed upon by the likes of RELAX NG. In general, this is a good step forward, but be aware that there will always be contenders for the throne.

RELAX NG has won my admiration for its elegance and simplicity. Writing schemas is easy and reading them even easier. It is easily translated into other schema languages such as W3C Schema and DTDs (using Trang), making it an ideal starting point for schema development. Niceties like interleave and nested grammars are not to be overlooked. It would be nice to see more built-in types defined in the specification, but it is likely most implementers will just extend them using the types in the W3C Schema recommendation, so no worries there.

By itself, Schematron is not nearly as useful as the other validation tools we saw in this chapter. One could do some simple tests with XPath, but it has no support for regular expressions to match elements or character data, and its dependence on a flat set of rules makes schema clunky and hard to develop. However, when used in conjunction with other validation tools, Schematron can really shine. "The real win with Schematron," writes XML expert Jeni Tennison, "is when you use it in tandem with another schema language, particularly W3C XML Schema. There is no way that you should use it for a standalone schema, but to test co-occurrence constraints once the initial validation is done with W3C XML Schema or RELAX NG, it's a godsend."

There was a time when all we had to work with were DTDs. With time, more and more XML tools become available, and we should all be thankful for the tireless efforts of many developers. It's worth taking time to become familiar with different schema types in order to find one that will fit your needs best.