Hack 95 Create Well-Formed XML with JavaScript

figs/expert.gif figs/hack95.gif

Use Javascript to ensure that you write correct, well-formed XML in web pages.

Sometimes you need to create some XML from within a browser. It is easy to write bad XML without realizing it. Writing correct XML with all its bells and whistles is not easy, but in this type of scenario you usually only need to write basic XML.

There is a kind of hierarchy of XML:

  1. Basic: Elements only; no attributes, entities, character references, escaped characters, or encoding issues

  2. Plain: Basic plus attributes

  3. Plain/escaped: Plain with special XML characters escaped

  4. Plain/advanced: Plain/escaped with CDATA sections and processing instructions

The list continues with increasing levels of sophistication (and difficulty).

This hack covers the basic and plain styles (with some enhancements), and you can adapt the techniques to move several more steps up the ladder if you like.

The main issues with writing basic XML is to get the elements closed properly and keep the code simple. Here is how.

7.6.1 The Element Function

Here is a Javascript function for writing elements:

// Bare bones XML writer - no attributes

function element(name,content){

    var xml

    if (!content){

        xml='<' + name + '/>'

    }

    else {

        xml='<'+ name + '>' + content + '</' + name + '>'

    }

    return xml

}

This basic hack even writes the empty-element form when there is no element content. What is especially nice about this hack is that you can use it recursively, like this:

var xml = element('p', 'This is ' + 

    element('strong','Bold Text') + 'inline')

Both inner and outer elements are guaranteed to be closed properly. You can display the result for testing like this:

alert(xml)

You can build up your entire XML document by combining bits like these, and all the elements will be properly nested and closed.

The element() function does not do any pretty-printing, because it has no way to know where line breaks should go. If that is important to you, just create a variant function:

function elementNL(name, content) {

    return element(name,content) + '\n'

}

More sophisticated variations are possible but rarely needed.

7.6.2 Adding Attributes

At the next level up, the most pressing problems are to format the attribute string properly, to escape single and double quotes embedded in the attribute values, and to do the least amount of quote escaping so that the result will be as readable as possible.

We modify the element() function to optionally accept an associative array containing the attribute names and values. In other languages, an associative array may be called a dictionary or a hash.

// XML writer with attributes and smart attribute quote escaping 

function element(name,content,attributes){

    var att_str = ''

    if (attributes) { // tests false if this arg is missing!

        att_str = formatAttributes(attributes)

    }

    var xml

    if (!content){

        xml='<' + name + att_str + '/>'

    }

    else {

        xml='<' + name + att_str + '>' + content + '</'+name+'>'

    }

    return xml

}

The function formatAtributes() handles formatting and escaping the attributes.

To fix up the quotes, we use the following algorithm if there are embedded quotes (single or double):

  1. Whichever type of quote occurs first in the string, use the other kind to enclose the attribute value.

  2. Only escape occurrences of the kind of quote used to enclose the attribute value. We don't need to escape the other kind.

Here is the code:

var APOS = "'"; QUOTE = '"'

var ESCAPED_QUOTE = {  }

ESCAPED_QUOTE[QUOTE] = '&quot;'

ESCAPED_QUOTE[APOS] = '&apos;'

   

/*

   Format a dictionary of attributes into a string suitable

   for inserting into the start tag of an element.  Be smart

   about escaping embedded quotes in the attribute values.

*/

function formatAttributes(attributes) {

    var att_value

    var apos_pos, quot_pos

    var use_quote, escape, quote_to_escape

    var att_str

    var re

    var result = ''

   

    for (var att in attributes) {

        att_value = attributes[att]

        

        // Find first quote marks if any

        apos_pos = att_value.indexOf(APOS)

        quot_pos = att_value.indexOf(QUOTE)

       

        // Determine which quote type to use around 

        // the attribute value

        if (apos_pos =  = -1 && quot_pos =  = -1) {

            att_str = ' ' + att + "='" + att_value +  "'"

            result += att_str

            continue

        }

        

        // Prefer the single quote unless forced to use double

        if (quot_pos != -1 && quot_pos < apos_pos) {

            use_quote = APOS

        }

        else {

            use_quote = QUOTE

        }

   

        // Figure out which kind of quote to escape

        // Use nice dictionary instead of yucky if-else nests

        escape = ESCAPED_QUOTE[use_quote]

        

        // Escape only the right kind of quote

        re = new RegExp(use_quote,'g')

        att_str = ' ' + att + '=' + use_quote + 

            att_value.replace(re, escape) + use_quote

        result += att_str

    }

    return result

}

Here is code to test everything we've seen so far:

function test() {   

    var atts = {att1:"a1", 

        att2:"This is in \"double quotes\" and this is " +

         "in 'single quotes'",

        att3:"This is in 'single quotes' and this is in " +

         "\"double quotes\""}

    

    // Basic XML example

    alert(element('elem','This is a test'))

   

    // Nested elements

    var xml = element('p', 'This is ' + 

    element('strong','Bold Text') + 'inline')

    alert(xml)

   

    // Attributes with all kinds of embedded quotes

    alert(element('elem','This is a test', atts))

   

    // Empty element version

    alert(element('elem','', atts))    

}

Open the file jswriter.html (Example 7-18) in a browser that supports Java-Script (the script is also stored in jswriter.js so you can easily include it in any HTML or XHTML document).

Example 7-18. jswriter.html
<html xmlns="http://www.w3.org/1999/xhtml">

<head><Title>Testing the Well-formed XML Hack</head>

<script type='text/javascript'>

// XML writer with attributes and smart attribute quote escaping 

function element(name,content,attributes){

    var att_str = ''

    if (attributes) { // tests false if this arg is missing!

        att_str = formatAttributes(attributes)

    }

    var xml

    if (!content){

        xml='<' + name + att_str + '/>'

    }

    else {

        xml='<' + name + att_str + '>' + content + '</'+name+'>'

    }

    return xml

}

var APOS = "'"; QUOTE = '"'

var ESCAPED_QUOTE = {  }

ESCAPED_QUOTE[QUOTE] = '&quot;'

ESCAPED_QUOTE[APOS] = '&apos;'

   

/*

   Format a dictionary of attributes into a string suitable

   for inserting into the start tag of an element.  Be smart

   about escaping embedded quotes in the attribute values.

*/

function formatAttributes(attributes) {

    var att_value

    var apos_pos, quot_pos

    var use_quote, escape, quote_to_escape

    var att_str

    var re

    var result = ''

   

    for (var att in attributes) {

        att_value = attributes[att]

        

        // Find first quote marks if any

        apos_pos = att_value.indexOf(APOS)

        quot_pos = att_value.indexOf(QUOTE)

       

        // Determine which quote type to use around 

        // the attribute value

        if (apos_pos =  = -1 && quot_pos =  = -1) {

            att_str = ' ' + att + "='" + att_value +  "'"

            result += att_str

            continue

        }

        

        // Prefer the single quote unless forced to use double

        if (quot_pos != -1 && quot_pos < apos_pos) {

            use_quote = APOS

        }

        else {

            use_quote = QUOTE

        }

   

        // Figure out which kind of quote to escape

        // Use nice dictionary instead of yucky if-else nests

        escape = ESCAPED_QUOTE[use_quote]

        

        // Escape only the right kind of quote

        re = new RegExp(use_quote,'g')

        att_str = ' ' + att + '=' + use_quote + 

            att_value.replace(re, escape) + use_quote

        result += att_str

    }

    return result

}

function test() {   

    var atts = {att1:"a1", 

        att2:"This is in \"double quotes\" and this is " +

         "in 'single quotes'",

        att3:"This is in 'single quotes' and this is in " +

         "\"double quotes\""}

    

    // Basic XML example

    alert(element('elem','This is a test'))

   

    // Nested elements

    var xml = element('p', 'This is ' + 

    element('strong','Bold Text') + 'inline')

    alert(xml)

   

    // Attributes with all kinds of embedded quotes

    alert(element('elem','This is a test', atts))

   

    // Empty element version

    alert(element('elem','', atts))    

}   

</script>

</head>

   

<body onload='test()'>

</body>

</html>

When the page loads, you will see the following in four successive alert boxes, as shown in Figure 7-1. The lines have been wrapped for readability.


First alert:

<elem>This is a test</elem>


Second alert:

<p>This is <strong>Bold Text</strong>inline</p>


Third alert:

<elem att1='a1'

att2='This is in "double quotes" and this is

in &apos;single quotes&apos;'

att3="This is in 'single quotes' and this is in

&quot;double quotes&quot;">This is a test</elem>


Fourth alert:

<elem att1='a1'

att2='This is in "double quotes" and this is in

&apos;single quotes&apos;'

att3="This is in 'single quotes' and this is in

&quot;double quotes&quot;"/>

Figure 7-1. jswriter.html in Firefox
figs/xmlh_0701.gif


7.6.3 Extending the Hack

You may want to escape the other special XML characters. You can do this by adding calls such as:

content = content.replace(/</g, '&lt;')

Take care not to replace the quotes in attribute values, since formatAttributes() handles this so nicely. Because the parameters to elements() and formatAttributes() are strings, they are easy to manipulate as you like.

7.6.4 Creating Large Chunks of XML

If you create long strings of XML, say with more than a few hundred string fragments, you may find the performance to be slow. That's normal, and happens because JavaScript, like most other languages, has to allocate memory for each new string every time you concatenate more fragments.

The standard way around this is to accumulate the fragments in a list, then join the list back to a string at the end. This process is generally very fast, even for very large results.

Here is how you can do it:

var results = [  ]

results.push(element("p","This is some content"))

results.push(element('p', 'This is ' + 

    element('strong','Bold Text') + 'inline'))

// ... Append more bits

   

var end_result = results.join(' ')

7.6.5 See Also

  • JavaScript: The Definitive Guide, by David Flanagan (O'Reilly)

?Tom Passin