The XML Information Set or Infoset (http://www.w3.org/TR/xml-infoset) is a recommendation from the W3C that describes an abstract data set whose definitions can be used to describe well-formed XML documents (documents don't have to be valid). These definitions are set forth so that other W3C specs can use the same terminology and not trip over each other's shoelaces.
An infoset is supposed to describe the result of parsing an XML document; it can also be constructed by other means, such as in a Document Object Model (DOM) tree (http://www.w3.org/TR/xml-infoset/#intro.synthetic). Normally, you don't hear folks talk about structures in XML documents using the terms defined in this spec.
The infoset consists of a set of 11 information items, each with a set of properties. The following list briefly outlines these information items and their associated properties:
Properties: all declarations processed, base URI, character encoding scheme, children, document element, notations, standalone, unparsed entities, version
Properties: attributes, base URI, children, in-scope namespaces, local name, namespace attributes, namespace name, parent, prefix
Properties: attribute type, local name, namespace name, normalized value, owner element, prefix, references, specified
Properties: base URI, content, notation, parent, target
Properties: declaration base URI, name, parent, public identifier, system identifier
Properties: character code, element content whitespace, parent
Properties: content, parent
Properties: children, parent, public identifier, system identifier
Properties: declaration base URI, name, notation, notation name, public identifier, system identifier
Properties: declaration base URI, name, public identifier, system identifier
Properties: namespace name, prefix
|
To help you understand the infoset better, the file archive includes infoset.xsl, an XSLT 2.0 stylesheet. The reason I used XSLT 2.0 is that it has more facilities for creating an infoset implementation than XSLT 1.0. infoset.xsl is only a partial XSLT implementation of the reporting infoset.
To use the stylesheet, you need an XSLT 2.0 processor, such as Saxon 8.0 or later (http://saxon.sourceforge.net). Saxon 8.0 isn't a complete XSLT 2.0/XPath 2.0 implementation, but it's getting closer. Download and unzip Saxon, and place saxon8.jar in the working directory where you installed the archive of files that came with the book. You'll need Java Version 1.4 or later, too.
You can apply this stylesheet to any XML document, as demonstrated here:
java -jar saxon8.jar prefix.xml infoset.xsl
Your results will be as follows:
Comment information item (1) [content]: a time instant [parent]: / Document information item [document element]: time [base URI]: file:/C:/Hacks/examples/115959p.m. Element information item (document element) [namespace]: http://www.wyeast.net/time [local name]: time [prefix]: tz [children]: [attributes]: timezone [base URI]: file:/C:/Hacks/examples/115959p.m. Element information item (1) [namespace]: http://www.wyeast.net/time [local name]: hour [prefix]: tz [children]: 11 [attributes]: [parent]: tz:time [base URI]: file:/C:/Hacks/examples/11 Element information item (2) [namespace]: http://www.wyeast.net/time [local name]: minute [prefix]: tz [children]: 59 [attributes]: [parent]: tz:time [base URI]: file:/C:/Hacks/examples/59 Element information item (3) [namespace]: http://www.wyeast.net/time [local name]: second [prefix]: tz [children]: 59 [attributes]: [parent]: tz:time [base URI]: file:/C:/Hacks/examples/59 Element information item (4) [namespace]: http://www.wyeast.net/time [local name]: meridiem [prefix]: tz [children]: p.m. [attributes]: [parent]: tz:time [base URI]: file:/C:/Hacks/examples/p.m. Element information item (5) [namespace]: http://www.wyeast.net/time [local name]: atomic [prefix]: tz [children]: [attributes]: signal [parent]: tz:time [base URI]: file:/C:/Hacks/examples/