Hack 21 Create an XML Document from a CSV File

figs/beginner.gif figs/hack21.gif

Want to go from CSV to XML? Use Dave Pawson's CSVToXML tool to convert CSV files to XML with Java.

Dave Pawson's CSVToXML translator converts comma-separated value (CSV) files to XML. CSV is a reliable, plain-text file format for the storing the output of a spreadsheet or database.

Suppose you are running Excel 2000 and you want to convert a file, inventory.xls, to XML (see Figure 2-24). Unfortunately, you haven't been able to talk your boss into buying Excel 2003 yet, which could easily output the spreadsheet as XML. Luckily, there is a workaround.

Figure 2-24. inventory.xls in Excel 2000

Save the file as CSV by choosing File Save As and selecting a CSV file format in the "Save as type" pull-down box. Navigate to the working directory where the other files from this book are, enter the name inventory.csv in the File name text box, and then click the Save button. The CSV file will appear as follows:


1,Oak chairs,6,31-Dec-04

2,Dining tables,1,31-Dec-04

3,Folding chairs,4,29-Dec-04


5,Overstuffed chair,1,30-Dec-04


7,Floor lamp,1,20-Dec-04

8,Oak bookshelves,1,31-Dec-04

9,Computer desk,1,31-Dec-04

10,Folding tables,3,31-Dec-04

11,Oak writing desk,1,28-Dec-04

12,Table lamps,5,26-Dec-04

13,Pine night tables,3,26-Dec-04

14,Oak dresser,1,30-Dec-04

15,Pine dressers,1,31-Dec-04

16,Pine armoire,1,31-Dec-04

Download the latest version of CSVToXML from http://www.dpawson.co.uk/java/index.html and extract the JAR file CVSToXML.jar from the ZIP archive and place it in the working directory. Enter this command:

java -jar CSVToXML.jar

If you see this output, you are ready to roll:

No property File available; Quitting

CSVToXML 1.0 from Dave Pawson

Usage: java CSVToXML [options] {param=value}...


  -p filename     Take properties from named file

  -o filename     Send output to named file

  -i filename     Take CSV input from named file

  -t              Display version and timing information

  -?              Display this message

CSVToXML relies on a properties file to determine how it will output XML. In the working directory, you will find an example properties file, props.txt, shown in Example 2-9.

Example 2-9. props.txt

comment=Generated using Dave Pawson's CSVToXML












Lines 1 through 7 provide head properties that will affect the entire file. The comment header (line 2) creates an XML comment at the beginning of the file. Line 3 specifies that a comma will separate the values in the source, and line 4 indicates that a row is delimited by a newline. Line 5 gives the document element for the XML document, and line 6 is a name for a parent element to each line of CSV. Line 7 specifies the number of fields the processor can expect. Lines 9 through 13 define the XML element names for each of the four fields (lines 10 through 13).

Now you're ready to put CSVToXML to work. Type the following line at a command prompt (the -p switch is for specifying a properties file, the -i option is for denoting the input file, and -o gives the output file):

java -jar CSVToXML.jar -p props.txt -i inventory.csv -o inventory.xml

The XML output is not indented and so is not very readable. However, you can apply an identity transform to it [Hack #38] and make it more attractive. Here is a portion of what inventory.xml looks like after applying the pretty-print hack:

<?xml version="1.0" encoding="ISO-8859-1"?>

<!-- Generated using Dave Pawson's CSVToXML -->










  <Description>Oak chairs</Description>


  <Date>31 Dec 2004</Date>




  <Description>Dining tables</Description>


  <Date>31 Dec 2004</Date>




  <Description>Folding chairs</Description>


  <Date>29 Dec 2004</Date>




You can use the XSLT stylesheet inventory.xsl from the book's file archive to format the output of CSVToXML as an HTML table. See [Hack #3] and [Hack #33] .

2.12.1 See Also

  • Danny Ayers's Java code CSVtoXML: http://www.dannyayers.com/old/code/CSVtoXML.htm