Edit, validate, and save XML documents with Microsoft Word 2003.
Microsoft Office 2003 has the best XML support that a version of Office has offered yet. It's not perfect, but in some places it shines. Not all Office 2003 products provide direct XML support, but three of the flagship products do?Microsoft Word 2003, Excel 2003, and Access 2003. This hack will discuss how to "do XML" with Word 2003.
Sadly, not all versions of Word 2003 have full-featured XML support. In order to get the full support, you need to buy Office 2003 Professional, Office 2003 Enterprise, or Word 2003 individually. Word has its own built-in schema called WordprocessingML. If you create a regular document in Word, you can save the document as XML in WordprocessingML. All versions of Word 2003 have this capability.
In the Office 2003 Professional, Office 2003 Enterprise, and individually packaged Word 2003 versions of Word 2003, you can attach your own XML Schema [Hack #69] document to an XML document. This means that you can export Word documents as XML, and they will be structured according to your own custom schema rather than Word's obscure binary format or its own WordprocessingML. This means that you can test and validate such documents using external XML tools?in other words, you aren't landlocked if you use the professional, enterprise, or individual versions of Word to produce XML.
You can store or attach XML schema in Word's schema library, and you can validate XML documents against their schema. To add a schema to Word's library, go to Tools Templates and Add-ins and then click the XML Schema tab. Now click the Add Schema button and navigate to the working directory where you will find the schema time.xsd. Click Open. You will be asked to associate a URI with the schema (any URI seems to work). Click "Validate document against attached schema" and "Allow saving as XML even if not valid," then click OK. The result will look like Figure 2-7.
Figure 2-8 shows how the XML document time.xml will look when edited in Word 2003. To view the XML Structure pane as shown, select an XML tag, right-click the mouse, and then choose View XML Structure. The XML Structure pane appears at the right of the window. Here you see a tree view of the elements in the document near the top of the pane, and below that, a list of elements that you can insert into the document by clicking them in the list. If you check and uncheck the box "Show XML tags in the document," it will turn the XML tags on and off.
To see attributes, click on a start tag (such as time) and then right-click and choose Attributes. You will see the assigned attributes for the tag in the Attributes dialog box as shown in Figure 2-9.
You can also perform transformations with XSLT in Word 2003. There are several ways to do it. One way is to click Browse in the XML data views pane (at the right of the editing pane in Word 2003). Select an appropriate stylesheet for the document time.xml, such as hour.xsl. Choose an encoding such as Unicode for the output of the transformation. To return to editing time.xml, click "Data only;" to transform the file again, click hour.xsl. (See Figure 2-10.)
Another way to use XSLT is when you save a file as XML. You have a chance to select a stylesheet at that time. To see this, choose File Save As. Near the bottom of the Save As dialog box, you'll see a checkbox "Apply transform." If checked, this means that the document will be transformed when saved. When you click Transform, you can select an XSLT stylesheet to apply when the file is saved. If you check the "Save data only" box, only the content of the elements is saved.
You can also apply a transform when opening an XML document. Choose File Open and then navigate to an XML document in the Open dialog box. Select a file, then click on the down-arrow on the Open button. There you will be able to choose Open with Transform. When you do, you will have a chance to choose an XSLT stylesheet in the Choose an XSL Transformation dialog box. Select one and click OK to perform the transformation. (Word is picky here for security reasons; if a stylesheet has scripts and is not signed, you can't use it. Also, even if a stylesheet works with another XSLT processor, it may not work with Word.)
If you are working with a regular Word document, such as time.doc from the file archive (Figure 2-11), you can save it as XML. Choose File Save As. In the "Save as type" pull-down menu, select XML Document. Enter a name for the file such as Time_word.xml. Then click Save.
This saves a long file that uses the WordprocessingML vocabulary, which stores information about the file in great detail. If you examine the file, you will find that it uses a total of 244 elements with 256 attributes. Gulp. You can bring up this file not only in Word but also in Internet Explorer with similar styling. A portion of Time_word.xml is shown here (I have inserted line breaks for readability):
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <?mso-application progid="Word.Document"?> <w:wordDocument xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml" xmlns:v= "urn:schemas-microsoft-com:vml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:sl ="http://schemas.microsoft.com/schemaLibrary/2003/core" xmlns:aml= "http://schemas.microsoft.com/aml/2001/core" xmlns:wx="http://schemas.microsoft.com/office/ word/2003/auxHint" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:dt="uuid: C2F41010-65B3-11d1-A29F-00AA00C14882" w:macrosPresent="no" w:embeddedObjPresent="no" w:ocxPresent="no" xml:space="preserve"> <o:DocumentProperties> <o:Title>Time</o:Title> <o:Author>Mike Fitzgerald</o:Author> <o:LastAuthor>Mike Fitzgerald</o:LastAuthor> <o:Revision>2</o:Revision> <o:TotalTime>0</o:TotalTime> <o:Created>2004-02-11T23:07:00Z</o:Created> <o:LastSaved>2004-02-11T23:07:00Z</o:LastSaved> <o:Pages>1</o:Pages> <o:Words>9</o:Words> <o:Characters>52</o:Characters> <o:Lines>1</o:Lines> <o:Paragraphs>1</o:Paragraphs> <o:CharactersWithSpaces>60</o:CharactersWithSpaces> <o:Version>11.5604</o:Version> </o:DocumentProperties> <w:fonts> <w:defaultFonts w:ascii="Times New Roman" w:fareast="Times New Roman" w:h-ansi="Times New Roman" w:cs="Times New Roman"/> <w:font w:name="Wingdings"> <w:panose-1 w:val="05000000000000000000"/> <w:charset w:val="02"/> <w:family w:val="Auto"/> <w:pitch w:val="variable"/> <w:sig w:usb-0="00000000" w:usb-1="10000000" w:usb-2="00000000" w:usb-3="00000000" w:csb-0="80000000" w:csb-1="00000000"/> </w:font> </w:fonts> ...
Office 2003 XML, by Evan Lenz, Mary McRae, and Simon St.Laurent (O'Reilly)
WordprocessingML documentation and schema: http://www.microsoft.com/downloads/details.aspx?FamilyID=fe118952-3547-420a-a412-00a2662442d9&displaylang=en