PHP 5 has five major XML extensions. Each one has different features, advantages, and costs:
The 800-pound gorilla of XML. You can do everything and the <kitchen-sink> with DOM, but navigating through your documents can be cumbersome.
PHP's original XML extension. SAX is a streaming, or event-based, parser that uses less memory than DOM, but frequently requires more complex PHP code.
A new PHP 5-only extension that excels at parsing RSS files, REST results, and configuration data. If you know the document's format ahead of time, SimpleXML is the way to go. However, SimpleXML supports only a subset of the XML specification.
This extension allows you to query XML documents like you're searching a database, to find the subset of information that you need and eliminate the unnecessary portions.
A way to take XML documents and transform them into HTML or another output format. It uses XML-based stylesheets to define the templates. XSLT is easily shared among different applications, but has a quirky syntax.
Most web developers are familiar with XML basics. However, PHP 5 makes it easier to use some less well-known parts of XML, including XPath and XML Namespaces. XPath is a powerful but underutilized way to search XML documents and extract information. With XML Namespaces, you can safely combine pieces of XML from multiple sources into a single XML document and still uniquely identify every element. If you're not familiar with XML or the advanced bits like XPath and XML Namespaces, you might want to make a quick detour to the XML introduction in Appendix A.
This chapter starts by introducing DOM and SimpleXML, the two major XML extensions in PHP 5. This introduction shows how to create, save, and interact with DOM and SimpleXML objects. These two approaches to working with XML are similar, but they have different aims. DOM is rigorous and works with any XML, whereas SimpleXML aims to make common XML tasks very easy, at the cost of not being useful for some types of XML.
Some programmers like DOM because it requires you to be highly disciplined. Additionally, they believe that when your requests are very explicit, it makes code easier to understand and reduces the possibility for mistakes.
Other people believe the opposite. They think that DOM is too heavyweight for general XML processing because even the most basic tasks require multiple levels of dereferencing. To them, these levels only clutter up code and make it hard to understand exactly what's going on.
In this book, XML examples are given in both DOM and SimpleXML, so you can get a feel for when you should (or prefer to) use one versus the other.
When porting code, it's nice to learn about new features by reading text describing them and by scanning a table listing the differences between the old and new versions. However, that's not practicable here?the major overhaul of the XML extensions means that often the new way to do an XML processing task is completely different from the old way. It's easier to understand what needs to be done by examining a few short before-and-after examples. By comparing the two, you can review your own code and make similar modifications. Once that's done, you can then double back and fill in the missing parts by looking at the other material.
Therefore, instead of listing all the new methods you can use to manipulate XML, the majority of this chapter contains code showing how to solve a series of common XML tasks. The examples are as follows:
Reading XML into a tree
Reading XML from a stream
Creating new XML documents
Searching XML with XPath
Changing XML into HTML or other output formats with XSLT
Validating XML to ensure it conforms to a specification
These sections are all presented in a before-and-after format. First, you see how it was done in PHP 4. Then, you see updated code showing what changes, if any, you need to make to your code for it to work under PHP 5.
To make things consistent, the examples all use the same XML documents. These documents represent a basic XML address book. Example 5-1 shows the standard version.
<?xml version="1.0"?> <address-book> <person id="1"> <!--Rasmus Lerdorf--> <firstname>Rasmus</firstname> <lastname>Lerdorf</lastname> <city>Sunnyvale</city> <state>CA</state> <email>rasmus@php.net</email> </person> <person id="2"> <!--Zeev Suraski--> <firstname>Zeev</firstname> <lastname>Suraski</lastname> <city>Tel Aviv</city> <state></state> <email>zeev@php.net</email> </person> </address-book>
Example 5-2 is a version that uses namespaces.
<?xml version="1.0"?> <ab:address-book xmlns:ab="http://www.example.com/address-book/"> <ab:person id="1"> <!--Rasmus Lerdorf--> <ab:firstname>Rasmus</ab:firstname> <ab:lastname>Lerdorf</ab:lastname> <ab:city>Sunnyvale</ab:city> <ab:state>CA</ab:state> <ab:email>rasmus@php.net</ab:email> </ab:person> <ab:person id="2"> <!--Zeev Suraski--> <ab:firstname>Zeev</ab:firstname> <ab:lastname>Suraski</ab:lastname> <ab:city>Tel Aviv</ab:city> <ab:state></ab:state> <ab:email>zeev@php.net</ab:email> </ab:person> </ab:address-book>