Hack 35 Explore a Document Tree with the xmllint Shell

figs/expert.gif figs/hack35.gif

Explore the tree structure of an XML document with xmllint's shell mode.

xmllint is a command-line tool available as part of libxml2 (http://xmlsoft.org). It is included in distributions such as Cygwin (http://www.cygwin.com) and Red Hat Linux (http://www.redhat.com). You can also download xmllib2 individually from http://xmlsoft.org. xmllint has an interactive shell mode that lets you traverse an XML document's tree structure as if it were a file structure, allowing you to examine any node in the tree discretely. Provided that you have an Internet connection, this shell mode will work on remote files as well as local ones. This hack will show you how it's done.

While in the working directory, we first invoke the shell on an XML document with xmllint:

xmllint --shell time.xml

/ >

A prompt appears (>). The location in the tree is shown to the left of the prompt (/), but with the depth of only one node. Enter the dir command to see information about the document or root node, and follow that with the base command to see the base URI of the document being explored:

/ > dir

DOCUMENT

version=1.0

encoding=UTF-8

URL=time.xml

standalone=true

/ > base

time.xml

/>

Move to a different node with cd, followed by another dir, then by a cat command:

/ > cd time/atomic

atomic > dir

ELEMENT atomic

  ATTRIBUTE signal

    TEXT

      content=true

atomic > cat

<atomic signal="true"/>

atomic >

A dir gives you information about the node, and cat gives you the XML representation of the node. Try the validate directive:

atomic > validate

validity error : no DTD found!

atomic >

time.xml doesn't have a DTD associated with it, so load valid.xml (which has a document type declaration) and try validate again:

atomic > load valid.xml

/ > validate

/ >

load replaces the current document time.xml with valid.xml, so validate is successful this time (no bad news means success). Use cd to move down the tree to time, enter pwd to see the path to the current node, and then enter du to see the element names in the subtree:

/ > cd time

time > pwd

/time

time > du

time

  hour

  minute

  second

  meridiem

  atomic

time >

Save the document in a new file with the save command, and then exit the shell with bye (exit and quit work, too):

time > save timeagain.xml

time > bye

You can also invoke the shell on a remote file (follow it with base):

xmllint --shell http://www.wyeast.net/time.xml

/ > base

http://www.wyeast.net/time.xml

/ >

You will be able to use the same commands on a remote document as you did on the local file. A list of xmllint's shell commands concludes this hack.

3.6.1 xmllint Shell Commands

Following are the shell commands available in xmllint's shell mode:


base

Display the xml:base of the node.


bye

Leave the shell (same as exit and quit).


cat [node]

Display the path to the node, if one is given, or the path to the current node.


cd [path]

Change the current node to the path if given and unique, but change to the document or root node if no argument is given.


dir [path]

Dump information about elements, attributes, namespaces, and so forth. Select the current node or node in the path, if given.


du [path]

Show the structure of the subtree under the current node or the path, if given.


exit

Leave the shell (same as bye and quit).


help

Show help.


free

Display memory usage.


load docname

Load a new document with the given name.


ls [path]

List contents of the path if given or the current directory.


pwd

Display the path to the current node.


quit

Leave the shell (same as bye or exit).


save [docname]

Save the current document to the document name if given or to the original name.


validate

Check the document for errors.


write name

Write the current node to the given filename.