Use Javascript to ensure that you write correct, well-formed XML in web pages.
Sometimes you need to create some XML from within a browser. It is easy to write bad XML without realizing it. Writing correct XML with all its bells and whistles is not easy, but in this type of scenario you usually only need to write basic XML.
There is a kind of hierarchy of XML:
Basic: Elements only; no attributes, entities, character references, escaped characters, or encoding issues
Plain: Basic plus attributes
Plain/escaped: Plain with special XML characters escaped
Plain/advanced: Plain/escaped with CDATA sections and processing instructions
The list continues with increasing levels of sophistication (and difficulty).
This hack covers the basic and plain styles (with some enhancements), and you can adapt the techniques to move several more steps up the ladder if you like.
The main issues with writing basic XML is to get the elements closed properly and keep the code simple. Here is how.
Here is a Javascript function for writing elements:
// Bare bones XML writer - no attributes function element(name,content){ var xml if (!content){ xml='<' + name + '/>' } else { xml='<'+ name + '>' + content + '</' + name + '>' } return xml }
This basic hack even writes the empty-element form when there is no element content. What is especially nice about this hack is that you can use it recursively, like this:
var xml = element('p', 'This is ' + element('strong','Bold Text') + 'inline')
Both inner and outer elements are guaranteed to be closed properly. You can display the result for testing like this:
alert(xml)
You can build up your entire XML document by combining bits like these, and all the elements will be properly nested and closed.
The element() function does not do any pretty-printing, because it has no way to know where line breaks should go. If that is important to you, just create a variant function:
function elementNL(name, content) { return element(name,content) + '\n' }
More sophisticated variations are possible but rarely needed.
At the next level up, the most pressing problems are to format the attribute string properly, to escape single and double quotes embedded in the attribute values, and to do the least amount of quote escaping so that the result will be as readable as possible.
We modify the element() function to optionally accept an associative array containing the attribute names and values. In other languages, an associative array may be called a dictionary or a hash.
// XML writer with attributes and smart attribute quote escaping function element(name,content,attributes){ var att_str = '' if (attributes) { // tests false if this arg is missing! att_str = formatAttributes(attributes) } var xml if (!content){ xml='<' + name + att_str + '/>' } else { xml='<' + name + att_str + '>' + content + '</'+name+'>' } return xml }
The function formatAtributes() handles formatting and escaping the attributes.
To fix up the quotes, we use the following algorithm if there are embedded quotes (single or double):
Whichever type of quote occurs first in the string, use the other kind to enclose the attribute value.
Only escape occurrences of the kind of quote used to enclose the attribute value. We don't need to escape the other kind.
Here is the code:
var APOS = "'"; QUOTE = '"' var ESCAPED_QUOTE = { } ESCAPED_QUOTE[QUOTE] = '"' ESCAPED_QUOTE[APOS] = ''' /* Format a dictionary of attributes into a string suitable for inserting into the start tag of an element. Be smart about escaping embedded quotes in the attribute values. */ function formatAttributes(attributes) { var att_value var apos_pos, quot_pos var use_quote, escape, quote_to_escape var att_str var re var result = '' for (var att in attributes) { att_value = attributes[att] // Find first quote marks if any apos_pos = att_value.indexOf(APOS) quot_pos = att_value.indexOf(QUOTE) // Determine which quote type to use around // the attribute value if (apos_pos = = -1 && quot_pos = = -1) { att_str = ' ' + att + "='" + att_value + "'" result += att_str continue } // Prefer the single quote unless forced to use double if (quot_pos != -1 && quot_pos < apos_pos) { use_quote = APOS } else { use_quote = QUOTE } // Figure out which kind of quote to escape // Use nice dictionary instead of yucky if-else nests escape = ESCAPED_QUOTE[use_quote] // Escape only the right kind of quote re = new RegExp(use_quote,'g') att_str = ' ' + att + '=' + use_quote + att_value.replace(re, escape) + use_quote result += att_str } return result }
Here is code to test everything we've seen so far:
function test() { var atts = {att1:"a1", att2:"This is in \"double quotes\" and this is " + "in 'single quotes'", att3:"This is in 'single quotes' and this is in " + "\"double quotes\""} // Basic XML example alert(element('elem','This is a test')) // Nested elements var xml = element('p', 'This is ' + element('strong','Bold Text') + 'inline') alert(xml) // Attributes with all kinds of embedded quotes alert(element('elem','This is a test', atts)) // Empty element version alert(element('elem','', atts)) }
Open the file jswriter.html (Example 7-18) in a browser that supports Java-Script (the script is also stored in jswriter.js so you can easily include it in any HTML or XHTML document).
<html xmlns="http://www.w3.org/1999/xhtml"> <head><Title>Testing the Well-formed XML Hack</head> <script type='text/javascript'> // XML writer with attributes and smart attribute quote escaping function element(name,content,attributes){ var att_str = '' if (attributes) { // tests false if this arg is missing! att_str = formatAttributes(attributes) } var xml if (!content){ xml='<' + name + att_str + '/>' } else { xml='<' + name + att_str + '>' + content + '</'+name+'>' } return xml } var APOS = "'"; QUOTE = '"' var ESCAPED_QUOTE = { } ESCAPED_QUOTE[QUOTE] = '"' ESCAPED_QUOTE[APOS] = ''' /* Format a dictionary of attributes into a string suitable for inserting into the start tag of an element. Be smart about escaping embedded quotes in the attribute values. */ function formatAttributes(attributes) { var att_value var apos_pos, quot_pos var use_quote, escape, quote_to_escape var att_str var re var result = '' for (var att in attributes) { att_value = attributes[att] // Find first quote marks if any apos_pos = att_value.indexOf(APOS) quot_pos = att_value.indexOf(QUOTE) // Determine which quote type to use around // the attribute value if (apos_pos = = -1 && quot_pos = = -1) { att_str = ' ' + att + "='" + att_value + "'" result += att_str continue } // Prefer the single quote unless forced to use double if (quot_pos != -1 && quot_pos < apos_pos) { use_quote = APOS } else { use_quote = QUOTE } // Figure out which kind of quote to escape // Use nice dictionary instead of yucky if-else nests escape = ESCAPED_QUOTE[use_quote] // Escape only the right kind of quote re = new RegExp(use_quote,'g') att_str = ' ' + att + '=' + use_quote + att_value.replace(re, escape) + use_quote result += att_str } return result } function test() { var atts = {att1:"a1", att2:"This is in \"double quotes\" and this is " + "in 'single quotes'", att3:"This is in 'single quotes' and this is in " + "\"double quotes\""} // Basic XML example alert(element('elem','This is a test')) // Nested elements var xml = element('p', 'This is ' + element('strong','Bold Text') + 'inline') alert(xml) // Attributes with all kinds of embedded quotes alert(element('elem','This is a test', atts)) // Empty element version alert(element('elem','', atts)) } </script> </head> <body onload='test()'> </body> </html>
When the page loads, you will see the following in four successive alert boxes, as shown in Figure 7-1. The lines have been wrapped for readability.
<elem>This is a test</elem>
<p>This is <strong>Bold Text</strong>inline</p>
<elem att1='a1'
att2='This is in "double quotes" and this is
in 'single quotes''
att3="This is in 'single quotes' and this is in
"double quotes"">This is a test</elem>
<elem att1='a1'
att2='This is in "double quotes" and this is in
'single quotes''
att3="This is in 'single quotes' and this is in
"double quotes""/>
You may want to escape the other special XML characters. You can do this by adding calls such as:
content = content.replace(/</g, '<')
Take care not to replace the quotes in attribute values, since formatAttributes() handles this so nicely. Because the parameters to elements() and formatAttributes() are strings, they are easy to manipulate as you like.
If you create long strings of XML, say with more than a few hundred string fragments, you may find the performance to be slow. That's normal, and happens because JavaScript, like most other languages, has to allocate memory for each new string every time you concatenate more fragments.
The standard way around this is to accumulate the fragments in a list, then join the list back to a string at the end. This process is generally very fast, even for very large results.
Here is how you can do it:
var results = [ ] results.push(element("p","This is some content")) results.push(element('p', 'This is ' + element('strong','Bold Text') + 'inline')) // ... Append more bits var end_result = results.join(' ')
JavaScript: The Definitive Guide, by David Flanagan (O'Reilly)
?Tom Passin