Hack 79 Generate Instances Based on Schemas

figs/beginner.gif figs/hack79.gif

Perhaps you need to generate XML instances of a given schema to create a test suite or you want a collection of instance files for some other reason. xmlspy (http://www.xmlspy.com) can create a single instance of a DTD or XML Schema document, but the Sun Instance Generator, a Java program, can create many XML instances based on a DTD, RELAX NG, Relax, or XML Schema. If you want a quick look at a single instance, xmlspy may be adequate for you. However, if you need a set of files, including some with intentional errors in them, Sun's tool is probably the way to go. This hack shows you how to use both tools.

5.13.1 Generating an Instance with xmlspy

These instructions assume that you have already downloaded and installed xmlspy 2004 Professional or Enterprise Edition (this won't work with the Home Edition). Follow these steps to create an instance of the DTD time.dtd:

  1. Open xmlspy.

  2. Choose File Open and the Open dialog box appears. Navigate to the working directory of files from the book file archive and select the file time.dtd. Then click the Open button.

  3. Choose DTD/Schema Generate sample XML file. The Generate sample XML file dialog box appears. All the radio buttons should be selected by default, and the repeatable elements text box should contain the number 1. This means that xmlspy will generate a document that includes all non-mandatory elements and attributes, the first choice of a choice (|) will be used, only one instance of repeatable elements will be used, and the elements and attributes will be filled with sample data. Now click OK.

  4. The sample XML document is generated. Choose File Save As and save the file as timegen.xml in the working directory.

  5. Click the Text tab. xmlspy should appear similar to Figure 5-11.

Figure 5-11. Generated XML document in xmlspy

5.13.2 Generating an Instance with the Sun Instance Generator

Kawaguchi Kohsuke developed the Sun Instance Generator, which is written in Java. Download the latest version of the ZIP archive from the address given earlier and extract the contents into the working directory. You will need to register on the Sun site (if you have not done so already) to download the generator. The following have been tested with Version 20040601, available from https://msv.dev.java.net/servlets/ProjectDocumentList?folderID=101 (but it might be outdated by the time you read this).

While at a command prompt, enter this command:

java -jar xmlgen.jar

The generator will give you a usage summary. Table 5-1 summarizes the possible options.

Table 5-1. Sun Instance Generator options

Command-line option


-dtd file.dtd

Use a DTD file as the model schema.


Use US-ASCII for element and attribute content or values. Without -ascii, the generator uses a broader range of Unicode characters.

-seed n

Set a random seed n.

-depth n

Determine cut-back depth of document, based on occurrence restraints. Once limit n is reached, the generator limits depth.

-width n

Tell generator the maximum number of times that the occurrence constraints zero or more (*) and one or more (+) are repeated.

-n n

Generate n number of instances.


Show warnings.


Don't show progress messages.

-root {namespaceURI}elementname

Fix the root element to the given element.


Java-style character encoding, such as UTF8.

-example file.xml

Provide an example file to guide the generator. You can use this option more than once.

-error m/n

Set the error ratio; that is, generate n number of errors per m elements (average).


Suppress the insertion of comments that indicate generated errors.

Let's get right to work. To generate an instance from the RELAX NG schema time.rng using US-ASCII content for elements and attributes, enter this command:

java -jar xmlgen.jar -ascii time.rng

The result will look like Example 5-18.

Example 5-18. Output of Sun Instance Generator
parsing a grammar: time.rng

generating document #1

<?xml version="1.0" encoding="UTF-8"?>

<time timezone="s{8Ty[^QpD;Wg*" xmlns:ns1="">





    <atomic signal="ka: ^p-1LazDC&gt;"/>


Try this to generate an instance from the XML Schema document, time.xsd:

java -jar xmlgen.jar -ascii time.xsd

And this to generate an instance from the DTD time.dtd (note -dtd switch):

java -jar xmlgen.jar -ascii -dtd time.dtd

Use the -example option to include the whole document in DTD output. The example file time.xml shows the tool the structure you are seeking to describe.

java -jar xmlgen.jar -ascii -example time.xml -dtd time.dtd

You can generate a number of instances by using the -n option (this example produces 10 instances):

java -jar xmlgen.jar -ascii -n 10 time.rng

Introduce errors in your instances by including an error ratio; e.g., -error 1/5 means output one error for every five elements (a relatively high occurrence of errors):

java -jar xmlgen.jar -ascii -error 1/5 time.rng

Comments in the output note the errors, unless you use the -nocomment option. Finally, to generate a series of 100 files with a low occurrence of errors and save them to disk, use the $ character in the output filename:

java -jar xmlgen.jar -n 100 -error 1/100 time.rng time-$.xml

This will create 100 files (time-00.xml, time-01.xml, time-02.xml, etc.) with occasional errors. You now have a test suite!

5.13.3 See Also

  • For more information on how to use the Sun Instance Generator, see the document HowToUse.html that comes with the software download