A.3 Schemas

When you validate HTML, your file is checked not only to see if it's well-formed, but also that your markup corresponds to the specification. While your application parses XML instead of HTML, it also expects data in a certain format. When it gets anything else, it can't work correctly.

Therefore, it's beneficial to create a data specification, or schema, that outlines the layout of the XML document your program requires. This allows you to check the input XML file against a specification to see if the XML is not only well-formed, but also valid.

There are three different schema formats: DTDs, XML Schema, and RelaxNG.


DTD

DTDs, short for Document Type Definitions, are the old way to write a schema. They come from SGML and have a more limited syntax than other formats. They're not written in XML, so they can be difficult to read. Try to avoid DTDs when you can.


XML Schema

XML Schema is the W3-approved document specification format. XML Schemas are written in XML, so your XML parser can also validate the schema. For more on XML Schema, see Eric van der Vlist's XML Schema (O'Reilly) or read the specification at http://www.w3.org/XML/Schema.


RelaxNG

An alternative to XML Schemas is RelaxNG, written by the OASIS group. Its home page is at http://www.relaxng.org/.

PHP 5 can validate documents against all three formats.