Explore the tree structure of an XML document with xmllint's shell mode.
xmllint is a command-line tool available as part of libxml2 (http://xmlsoft.org). It is included in distributions such as Cygwin (http://www.cygwin.com) and Red Hat Linux (http://www.redhat.com). You can also download xmllib2 individually from http://xmlsoft.org. xmllint has an interactive shell mode that lets you traverse an XML document's tree structure as if it were a file structure, allowing you to examine any node in the tree discretely. Provided that you have an Internet connection, this shell mode will work on remote files as well as local ones. This hack will show you how it's done.
While in the working directory, we first invoke the shell on an XML document with xmllint:
xmllint --shell time.xml / >
A prompt appears (>). The location in the tree is shown to the left of the prompt (/), but with the depth of only one node. Enter the dir command to see information about the document or root node, and follow that with the base command to see the base URI of the document being explored:
/ > dir DOCUMENT version=1.0 encoding=UTF-8 URL=time.xml standalone=true / > base time.xml />
Move to a different node with cd, followed by another dir, then by a cat command:
/ > cd time/atomic atomic > dir ELEMENT atomic ATTRIBUTE signal TEXT content=true atomic > cat <atomic signal="true"/> atomic >
A dir gives you information about the node, and cat gives you the XML representation of the node. Try the validate directive:
atomic > validate
validity error : no DTD found!
atomic >
time.xml doesn't have a DTD associated with it, so load valid.xml (which has a document type declaration) and try validate again:
atomic > load valid.xml / > validate / >
load replaces the current document time.xml with valid.xml, so validate is successful this time (no bad news means success). Use cd to move down the tree to time, enter pwd to see the path to the current node, and then enter du to see the element names in the subtree:
/ > cd time time > pwd /time time > du time hour minute second meridiem atomic time >
Save the document in a new file with the save command, and then exit the shell with bye (exit and quit work, too):
time > save timeagain.xml time > bye
You can also invoke the shell on a remote file (follow it with base):
xmllint --shell http://www.wyeast.net/time.xml
/ > base
http://www.wyeast.net/time.xml
/ >
You will be able to use the same commands on a remote document as you did on the local file. A list of xmllint's shell commands concludes this hack.
Following are the shell commands available in xmllint's shell mode:
Display the xml:base of the node.
Leave the shell (same as exit and quit).
Display the path to the node, if one is given, or the path to the current node.
Change the current node to the path if given and unique, but change to the document or root node if no argument is given.
Dump information about elements, attributes, namespaces, and so forth. Select the current node or node in the path, if given.
Show the structure of the subtree under the current node or the path, if given.
Leave the shell (same as bye and quit).
Show help.
Display memory usage.
Load a new document with the given name.
List contents of the path if given or the current directory.
Display the path to the current node.
Leave the shell (same as bye or exit).
Save the current document to the document name if given or to the original name.
Check the document for errors.
Write the current node to the given filename.