Hack 12 Edit XML Documents with Emacs and nXML

figs/moderate.gif figs/hack12.gif

nXML mode for GNU Emacs provides a powerful environment for creating valid XML documents.

If you've been editing XML from within GNU Emacs using PSGML, here's a tip: get rid of it. That's right, tear it out, dump it, make it disappear?because there's a much better tool available: nXML. (Grab the latest nxml-mode-200nnnnn.tar.gz file from http://www.thaiopensource.com/download/.) nXML was developed by James Clark, the man who brought us groff, expat, sgmls, SP, and Jade, as well as being a driving force behind the development of XPath, XSLT (and before that, DSSSL), and, along with Murata Makoto, RELAX NG (http://www.relaxng.org/).

Which brings us back to what nXML is all about: nXML is a very clever mechanism for doing RELAX NG-driven, context-sensitive, validated editing. What's particularly clever about it is that, unlike PSGML and unlike virtually every other XML editing application available?with the exception of the Topologi Collaborative Markup Editor (http://www.topologi.com/products/tme/)?it provides real-time, automatic visual identification of validity errors.

This hack assumes that you are familiar with Emacs. The README file that comes with nXML states that you must use Emacs version 21.x (preferably 21.3 or later) in order to use nXML. To get nXML to run in Emacs, you must first load the rng-auto.el file. In Emacs, type:

M-x load-file

Then load the file rng-auto.el from the location where you downloaded and extracted the latest version of nXML. This file defines the autoloads for nXML. Now open an XML document (C-x C-f) and enter:

M-x nxml-mode

You are good to go! For help, type:

C-h m

2.3.1 Spotting Validity Errors in Real Time

What "automatic visual identification of validity errors" means is that if you create and edit documents using nXML, you never need to manually run a separate validation step to determine whether a document is valid; i.e., if a document contains a validity error, you will know instantly as you edit the document because it will be visually flagged. Here's how it works. As you're editing a document:

  • nXML incrementally reparses and revalidates the document in the background during idle periods between the times when you are actually typing in content. You can wait for nXML to finish validating the entire document (which usually takes only a matter of seconds), or if you're working with a large document, you don't need to wait: the moment you start typing in content, nXML will stop its background parsing and validating until you're idle once again.

  • nXML describes the current validity state in the mode line at the bottom of the Emacs interface; at any point while you're editing a document, the mode line will say either Valid, Invalid, or Validated nn%, where nn is a number indicating what percentage of the document has been validated so far.

  • nXML visually highlights all instances of invalidity it finds in the part of the document it has validated so far (by default, the value of the Emacs face it uses is a red underline, but the highlighting can be changed by customizing that face).

If you mouse over or move your cursor over one of the points that nXML has highlighted as invalid, text appears describing the validity error, either as popup text or in the minibuffer echo area at the bottom of the Emacs interface. Figure 2-2 [Hack #62]

Figure 2-2. nXML validation error message

2.3.2 Getting Help with nXML

To get oriented with the basics of editing within nXML:

  • Type C-h (or M-x describe-mode) for quick help with nXML commands and key bindings.

  • For more extensive documentation, access the nXML manual (in texinfo format) by typing M-x info.

  • Make sure to read the NEWS file in the nXML distribution; it probably contains some late information that hasn't yet made its way into the nXML manual.

2.3.3 Using Context-Sensitive Completion

The nXML mechanism for doing context-sensitive insertion/completion of markup is similar to the mechanism that PSGML provides. With nXML, you:

  1. Place your cursor at some point in a document.

  2. Type a keyboard combination (in the nXML case, C-Return) to do context-sensitive checking to see what markup (elements, attributes, or enumerated attribute values) is valid at that point in the document; Emacs then opens up a completion buffer containing a list of the valid markup choices.

  3. Either use your mouse to select one of the choices from the completion buffer, or type the first few letters of one of the choices and then tab to cause Emacs to do completion on that name or value. Figure 2-3 shows context-sensitive completion using DocBook.

Figure 2-3. nXML context-sensitive completion

2.3.4 Making nXML Work Your Way

To fine-tune the behavior of nXML:

  • Explore nXML's extensive, well-documented set of customization options by typing M-x customize-group nXML.

  • Even if you change no other nXML option, try setting the value of the Nxml Sexp Element Flag option (nxml-sexp-element-flag variable) to on (non-nil). The default value (nil) means that Emacs sexp commands?for example, C-M-k (kill-sexp)?operate on tags. What you probably want instead is for them to operate on elements, which is what turning on the Nxml Sexp Element Flag option will do for you.

  • Spend some time experimenting with the syntax-highlighting options; nXML provides what must be by far the best and most configurable syntax-highlighting capabilities of any XML editing application currently available. Over 30 customizable Emacs faces enable you to independently control color and character formatting of everything from the level of element and attribute names down to the level of different types of markup delimiters (e.g., angle-bracket tag delimiters, the quote marks around attribute values, etc.).

2.3.5 Entering and Displaying Special Characters

Another area where nXML is very clever is the way in which it enables you to enter and display special characters. To enter a special character, such as a copyright sign:

  1. Type C-c C-u. nXML then prompts you for the name of the character to enter.

  2. Type the first few letters of the character name and then hit tab. nXML then does completion, presenting you with a list of all character names that start with the letters you type in. For example, if you enter cop, nXML will present you with a list of several character names that starts with COPTIC, along with the name of the character that's probably the one you're looking for: COPYRIGHT SIGN.

  3. Either use your mouse to select one of the choices from the completion buffer, or type more letters then tab again to narrow down the choices to the character you need. Or, if you just type copy to begin with, you'll get straight to the copyright sign (because it's the only character name that begins with COPY).

Note that, by default, nXML inserts the hexadecimal character entity reference, not the actual character; e.g., for the copyright sign, nXML inserts the character reference ©. This ensures that you will be able to interpret what the character is if it is displayed by software that does not understand Unicode.

But this is where things get interesting: even though nXML writes only the numeric character reference to the file, it displays the glyph for the character (along with the character reference itself). And if you mouse over the character reference, nXML displays the full name of the character, either as pop-up text or in the minibuffer echo area at the bottom of the Emacs interface (Figure 2-4).

Figure 2-4. nXML display of special characters

As far as special characters go, nXML lets you have your cake and eat it too. You get:

  • An easy way to enter special characters as character references, without needing to memorize or look up their numeric values or ISO entity names.

  • The ability to see glyphs and full names for all the character references in your documents, while still being able to distribute them to others as ASCII-encoded files (so you're not depending on others having editors that support Unicode or some other encoding).

To enter special characters in other ways:

  • Instead of typing C-c C-u to get prompted for a character name, type C-u C-c C-u. You'll go through the same completion process to enter the name, but when you're done, nXML will insert the character directly, instead of inserting the character reference. GNU Emacs 21.x or later supports display of Unicode and many other encodings (as long as you have the fonts), so you don't have to avoid inserting characters directly unless you need to share your source documents with others who might not have Unicode-enabled editors.

  • Try Norm Walsh's XML Unicode Lisp package (http://nwalsh.com/emacs/xmlchars/). Among other things, it automatically inserts "smart" quotes in just the same way that most word-processing applications do, along with a smart em-dash/en-dash feature. It also provides a menu-driven mechanism for entering special characters, so you don't need to type and do completion; instead, you just select a character name from a menu. Compatibility with nXML's native character-insertion mechanism isn't a problem?the two coexist with one another quite happily.

2.3.6 See Also

  • Learning GNU Emacs by Debra Cameron, Bill Rosenblatt, and Eric S. Raymond (O'Reilly)

  • The nXML mailing list is the first place to go if you have questions or run into problems: http://groups.yahoo.com/group/emacs-nxml-mode/

?Michael Smith