Hack 91 Pipeline XML with Ant

figs/moderate.gif figs/hack91.gif

Ant (http://ant.apache.org) uses build files that are written in XML, and takes advantage of XML in a variety of ways. It's a suitable (if not ideal) framework for XML pipelining?it's open, mature, stable, readily available, widely known and used, easily extensible, and already amenable to XML processing. What else could you ask for?

In this hack, I'll show you the XML structures in an Ant build file, named build.xml by default; talk about some common XML-related tasks that Ant can perform; and end with an example of XML pipelining.

To get the examples in this hack to work, you'll need to download and install Ant Version 1.6.1 (or later) binaries from http://ant.apache.org/bindownload.cgi. Because you'll be using an external task that validates with RELAX NG (http://www.relaxng.org) schemas, you'll also need James Clark's Jing (http://www.thaiopensource.com/relaxng/jing.html).

7.2.1 Validating an XML Document

Ant has a task for validating XML documents called xmlvalidate. By default, Ant validates with Xerces. The XML document valid.xml is shown in Example 7-1.

Example 7-1. valid.xml
<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE time SYSTEM "time.dtd">

   

<!-- a time instant -->

<time timezone="PST">

 <hour>11</hour>

 <minute>59</minute>

 <second>59</second>

 <meridiem>p.m.</meridiem>

 <atomic signal="true"/>

</time>

It points to the DTD time.dtd (Example 7-2).

Example 7-2. time.dtd
<!ELEMENT time (hour,minute,second,meridiem,atomic)>

<!ATTLIST time timezone CDATA #REQUIRED>

<!ELEMENT hour (#PCDATA)>

<!ELEMENT minute (#PCDATA)>

<!ELEMENT second (#PCDATA)>

<!ELEMENT meridiem (#PCDATA)>

<!ELEMENT atomic EMPTY>

<!ATTLIST atomic signal CDATA #REQUIRED>

You can validate valid.xml with the build file build.xml, which uses the xmlvalidate task (Example 7-3).

Example 7-3. build.xml
<?xml version="1.0"?>

   

<project default="valid">

 <target name="valid">

  <xmlvalidate file="valid.xml"/>

 </target>

</project>

The target element is a child of project and must have a name attribute. The value of this attribute matches the value of the default attribute of project, i.e., valid. When there is more than one target in a build file, the value of default only matches the value of one name attribute in one target. The target element also has several other attributes not shown here. On the xmlvalidate element, the file attribute specifies the document to validate (in this case, valid.xml).

In the working directory, and with Ant installed and in the path, issue the command:

ant

Ant knows to look for the build.xml file, and to take its orders from there. The ant command produces the following output, if successful:

Buildfile: build.xml

   

valid:

[xmlvalidate] 1 file(s) have been successfully validated.

   

BUILD SUCCESSFUL

Total time: 1 second

In Ant, types are elements that can help perform tasks, especially on groups of files. For example, using the fileset type as a child of xmlvalidate, you can validate a series of XML documents, as shown in build-fileset.xml (Example 7-4).

Example 7-4. build-fileset.xml
<?xml version="1.0"?>

   

<project default="valid">

 <target name="valid">

  <xmlvalidate>

   <fileset file="*ternal.xml"/>

  </xmlvalidate>

 </target>

</project>

The file attribute of fileset allows you to specify a series of files with wildcards. If you run this build file, you will see that Ant validates both the internal.xml and external.xml documents in one step.

The xmlvalidate task has several features I haven't mentioned, but are worth looking at, such as checking a document only for well-formedness by using lenient="yes" (see http://ant.apache.org/manual/OptionalTasks/xmlvalidate.html).

7.2.2 The Jing Task

One way that you can extend Ant is by writing your own task (instructions on how to do this are found at http://ant.apache.org/manual/develop.html#writingowntask). James Clark has written a task for Jing that allows you to use Ant to validate XML documents against RELAX NG schemas, using either XML or compact syntax. This task is documented at http://www.thaiopensource.com/relaxng/jing-ant.html.

Jing's source code (JingTask.java) is available for download from http://www.thaiopensource.com/download/jing-20030619.zip, but for convenience I have included a copy of JingTask.java in the example file archive for easy inspection (along with a copy of Jing's license, jing-copying.txt).

The document time.xml is valid with regard to the RELAX NG schema time.rng, shown in Example 7-5.

Example 7-5. time.rng
<element name="time" xmlns="http://relaxng.org/ns/structure/1.0">

 <attribute name="timezone"/>

 <element name="hour"><text/></element>

 <element name="minute"><text/></element>

 <element name="second"><text/></element>

 <element name="meridiem"><text/></element>

 <element name="atomic">

  <attribute name="signal"/>

 </element>

</element>

To validate time.xml against time.rng with Ant, use the build file build-jing.xml (Example 7-6).

Example 7-6. build-jing.xml
<?xml version="1.0"?>

   

<project default="rng">

   

 <taskdef name="jing" classname="com.thaiopensource.relaxng.util.JingTask"/>

   

 <target name="rng">

  <echo message="Validating with RELAX NG schema using Jing..."/>

  <jing rngfile="time.rng" file="time.xml"/>

 </target>

   

</project>

The taskdef element defines the jing task in the name attribute, and the classname attribute identifies the class that executes the task. The compiled class is stored in jing.jar. If you place jing.jar in Ant's lib directory, Ant will be able to find the task. (For example, on my Windows machine, I've placed jing.jar in C:\Java\Ant\apache-ant-1.6.1\lib.)

The echo task echoes the text in the message attribute. Jing is silent upon success, so you can throw in an echo task to send a message of some sort, as shown in Example 7-6. The jing task's rngfile attribute identifies a RELAX NG schema, and the file attribute names the instance of the schema. You can also use a fileset type as a child of jing, allowing you to validate more than one document at a time.

Run this build file with this command:

ant -f build-jing.xml

and you will get a result like this:

Buildfile: build-jing.xml

   

rng:

     [echo] Validating with RELAX NG schema using Jing...

   

BUILD SUCCESSFUL

Total time: 1 second

Jing can also validate against schemas in the compact syntax, RELAX NG's terse, non-XML format. The compact schema time.rnc is shown in Example 7-7.

Example 7-7. time.rnc
element time {

  attribute timezone { text },

  element hour { text },

  element minute { text },

  element second { text },

  element meridiem { text },

  element atomic {

    attribute signal { text }

  }

}

The build file build-rnc.xml (Example 7-8) validates time.xml against time.rnc. Note the addition of the compactsyntax attribute to the jing task element.

Example 7-8. build-rnc.xml
<?xml version="1.0"?>

   

<project default="rng">

   

 <taskdef name="jing" classname="com.thaiopensource.relaxng.util.JingTask"/>

   

 <target name="rng">

  <echo message="Validating with RELAX NG compact syntax schema using

     Jing..."/>

  <jing compactsyntax="true" rngfile="time.rnc" file="time.xml"/>

 </target>

   

</project>

Give the command:

ant -f build-rnc.xml

and you will get this report:

Buildfile: build-rnc.xml

   

rng:

     [echo] Validating with RELAX NG compact syntax schema using 

Jing...

   

BUILD SUCCESSFUL

Total time: 1 second

7.2.3 An XML Pipeline Example

This example places previously discussed tasks together into a single build file and adds a few other targets as well. The resulting file, build-all.xml, is an example of a simple XML pipeline. The basic scenario is that a property is set (holding the current directory) using a local XML document (properties.xml), and a remote ZIP file (time.zip) is downloaded via the get task. The ZIP archive contains four files: two RELAX NG schemas (time1.rng and time1.rnc), the DTD time1.dtd, and an XML instance time2.xml. This archive is unzipped and time2.xml is validated against time1.rng, time1.rnc, and time1.dtd. Then, time2.xml is transformed into a text document with XSLT (clock.txt). Granted, more complex operations are possible, but this gives you an idea of how you can put a pipeline together.

The build file is shown in Example 7-9.

Example 7-9. build-all.xml
<?xml version="1.0"?>

   

<project default="xform">

   

<taskdef name="jing" classname="com.thaiopensource.relaxng.util.JingTask"/>

   

<target name="init">

 <echo message="Load XML properties..."/>

 <xmlproperty file="properties.xml"/>

 <property name= "MailLogger.from" value="schlomo@example.com"/>

 <property name= "MailLogger.success.to" value="harvey@example.com"/>

 <property name= "MailLogger.failure.to" value="joe@example.com"/>

 <property name= "MailLogger.mailhost" value="mail.example.com"/>

</target>

   

<target name="get" depends="init">

 <get src="http://www.wyeast.net/time.zip" dest="time.zip"/>

</target>

   

<target name="unzip" depends="get">

 <unzip src="time.zip" dest="${build.dir}"/>

</target>

   

<target name="rng" depends="unzip">

 <echo message="Jing validating (XML)..."/>

 <jing rngfile="time1.rng" file="time2.xml"/>

</target>

   

<target name="rnc" depends="rng">

 <echo message="Jing validating (compact)..."/>

 <jing compactsyntax="yes" rngfile="time1.rnc" file="time2.xml"/>

</target>

   

<target name="val" depends="rnc">

 <xmlvalidate file="time2.xml" failonerror="no">

  <dtd publicId="-//Wy'east Communications//Time DTD//EN"

       location="file:///C:/Hacks/examples/time1.dtd"/>

 </xmlvalidate>

</target>

   

<target name="xform" depends="val">

 <echo message="Transforming time2.xml by clock1.xsl..."/>

 <xslt in="time2.xml" out="clock.txt"

       style="clock1.xsl">

  <outputproperty name="method" value="text"/>

  <outputproperty name="encoding" value="US-ASCII"/>

 </xslt>

</target>

   

</project>

To run the pipeline, simply type:

ant -f build-all.xml

The output will look like this, provided you have a live Internet connection:

Buildfile: build-all.xml

   

init:

     [echo] Load XML properties...

   

get:

      [get] Getting: http://www.wyeast.net/time.zip

   

unzip:

    [unzip] Expanding: C:\Hacks\examples\time.zip into C:\Hacks\examples

   

rng:

     [echo] Jing validating (XML)...

   

rnc:

     [echo] Jing validating (compact)...

   

val:

[xmlvalidate] 1 file(s) have been successfully validated.

   

xform:

     [echo] Transforming valid.xml by clock.xsl...

   

BUILD SUCCESSFUL

Total time: 3 seconds

Each of the targets, except the one named init, has a depends attribute. The value of this attribute establishes a hierarchy of dependencies between the targets. The default or starting target is xform (identified in the project element); in order for this target to execute, the val target must first execute successfully, and in order for val to execute, rnc must execute, and so forth. So this dependency is not established structurally, as through a parent-child relationship, but rather through attribute values. You can put the targets in any order in the build file. They will still execute according to the order of the values in the depends and name attributes. These dependencies make up the segments of the pipeline.

The build file has an xslt target that transforms time2.xml into clock.txt according to the XSLT stylesheet clock1.xsl. The outputproperty children contribute attributes and values that would normally be supplied by the output element of XSLT.

The xmlvalidate target uses a dtd child to specify a formal public identifier for the DTD and the location of a local copy of that DTD.

The get target gets a URL source, downloading it to a specified location. The xmlproperty target reads the file properties.xml:

<?xml version="1.0"?>

   

<build>

 <dir>.</dir>

</build>

The arbitrary tags in the properties file determine the name or names for the variable that you can use elsewhere in the build file to reference values, such as ${build.dir}. The first part of the variable name comes from the build tag and the second part from dir. The content of dir becomes the value of the variable.

The property elements in the init target list some properties for the Ant MailLogger (http://ant.apache.org/manual/listeners.html), which will send an email containing the Ant build information from schlomo to harvey (on success) or joe (on failure) at example.com, using the mailhost mail.example.com. These are, of course, dummied values. Use email addresses and a mail server that will work for you when running this example.

To get the MailLogger to work, use the -logger switch:

ant -logger org.apache.tools.ant.listener.MailLogger -f build-all.xml

7.2.4 See Also

  • Ant's online manual: http://ant.apache.org/manual/index.html

  • Ant Wiki: http://wiki.apache.org/ant/FrontPage

  • "XML Pipelining with Ant," by Michael Fitzgerald. XML.com, January 28, 2003: http://www.xml.com/pub/a/2003/01/29/ant.html

  • "Running Multiple XSLT Engines with Ant," by Anthony Coates. XML.com, December 11, 2002: http://www.xml.com/pub/a/2002/12/11/ant-xml.html

  • Ant: The Definitive Guide, by Jesse E. Tilly and Eric M. Burke (O'Reilly)

  • Java Development with Ant, by Eric Hatcher and Steve Loughran (Manning)