10.7 PYX

PYX is an early XML stream solution that converts XML into character data compatible with text applications like grep, awk, and sed. Its name represents the fact that it was the first XML solution in the programming language Python. XML events are separated by newline characters, fitting nicely into the line-oriented paradigm of many Unix programs. Table 10-2 summarizes the notation of PYX.

Table 10-2. PYX notation




An element start tag


An element end tag


Character data


An attribute


A processing instruction

For every event coming through the stream, PYX starts a new line, beginning with one of the five event symbols. This line is followed by the element name or whatever other data is pertinent. Special characters are escaped with a backslash, as you would see in Unix shell or Perl code.

Here's how a parser converting an XML document into PYX notation would look. The following code is XML input by the parser:

  <!-- brand is not important -->
  <item>rocket engine</item>
  <item optional="yes">caviar</item>

As PYX, it would look like this:

-rocket engine
Aoptional yes

Notice that the comment didn't come through in the PYX translation. PYX is a little simplistic in some ways, omitting some details in the markup. It will not alert you to CDATA markup sections, although it will let the content pass through. Perhaps the most serious loss is character entity references, which disappear from the stream. You should make sure you don't need that information before working with PYX.

PYX is an interesting alternative to SAX and DOM for quick-and-dirty XML processing. It's useful for simple tasks like element counting, separating content from markup, and reporting simple events. However, it does lack sophistication, making it less attractive for complex processing. Today, I consider it to be more of interest for historical reasons than as a recommendation.