Hack 18 Create an XML Document from a Text File with xmlspy

figs/beginner.gif figs/hack18.gif

How do you get your old stuff into XML? Legacy text files can be translated into XML with xmlspy.

Perhaps you have plain-text files that you'd like to convert to XML so that the data will interoperate with the latest applications. You can do it by hand with a text or XML editor or you can use a tool that will do it for you automatically. xmlspy (Professional or Enterprise edition) is one of those tools. It's easy to figure out xmlspy's text-to-XML interface, so that's the one I'll show you here. (I used the Enterprise edition when testing this.)

First, here is a little plain-text file, time.txt, that just contains data fields separated by semicolons:



The first line defines fields that will be converted to XML markup; the second line defines the content of that markup. A semicolon (;) delimits each of the fields. The second line ends with a field containing a single space, which of course you can see.

Now open xmlspy and select Convert Import Text file. The Text import dialog box is shown in Figure 2-18. Click the Choose File button and open the file time.txt. Make sure that the file encoding is Unicode UTF-8, the field delimiter is Semicolon, and that "First row contains field names" is checked.

Click the symbol to the left of the timezone field name in the first row so that it becomes an equals sign. This specifies that the timezone field will be interpreted as an attribute in the output. Then click OK.

Figure 2-18. time.txt in xmlspy's Text Import dialog box

Click the Text label at the bottom of the document pane to see the result in Figure 2-19. The XML declaration and the import and row elements were inserted by xmlspy; the remaining elements were derived from time.txt. You could change the new document by hand to match time.xml (from Chapter 1), or you could apply an XSLT stylesheet to it. XSLT hacks begin in earnest in Chapter 3, but I'll use an XSLT stylesheet here (without going into detail about the stylesheet itself) to show you how to shape this document up.

Figure 2-19. Result of importing time.txt

Select XSL XSL Transformation or press F10, and the dialog box in Figure 2-20 appears. Click the Browse button and open the stylesheet time.xsl. Then click OK. The imported text is then transformed by xmlspy's XSLT engine, according to time.xsl. Again click the Text label under the document pane and select Edit Pretty-Print XML Text. The final result is shown in Figure 2-21. You can save this document with File Save.

Figure 2-20. XSL file dialog box in xmlspy

Figure 2-21. Transformed and beautified text-to-XML conversion

You can also convert text files whose data fields are separated by tabs, commas, or spaces. You can also select fields whose text is enclosed in single or double quotes. I chose semicolons in the first example because they are easier to see than space and tabs. The text file time2.txt (Example 2-6) uses tabs as delimiters.

Example 2-6. time2.txt
timezone        hour    minute  second  meridiem        atomic

PST     11      59      59      p.m.

MST     12      59      59      a.m.

CST     01      59      59      a.m.

EST     02      59      59      a.m.

AST     03      59      59      a.m.

BST     04      59      59      a.m.

FST     05      59      59      a.m.

AT      06      59      59      a.m.

UTC     07      59      59      a.m.

Run this file through the conversion steps, making sure to select Tab as the field delimiter in the Text Import dialog box, as shown in Figure 2-22.

Figure 2-22. time2.txt in the Text import dialog box

You can experiment with the other delimiters by changing the delimiters in time.txt or time2.txt to other kinds of delimiters and stepping through the conversion again. With some experimentation you will see that xmlspy can convert many kinds of text files.

2.9.1 See Also

  • Sysonyx's xmlArchitect: http://www.sysonyx.com/Products/xmlLinguist/

  • For heavy-duty text-to-XML conversions, a dedicated hardware solution from Xlipstream offers rackmounted appliances that do the conversions: http://www.xlipstream.com/