XML Basics

Although the name Extensible Markup Language (XML) sounds a bit cryptic, don't worry: the format itself is actually quite easy to understand. In a nutshell, XML provides a way of formatting and structuring information so that receiving applications can easily interpret and use that data when it's moved from place to place. Although you may not realize it, you already have plenty of experience structuring and organizing information. Consider the following example.

Suppose you want to write a letter to a friend. You structure your thoughts (information) in a format you know your friend will recognize. You begin by writing words on a piece of paper, starting in the upper-left corner, and breaking your thoughts into paragraphs, sentences, and words. You could use images to convey your thoughts, or write your words in a circle, but that probably would confuse your friend. By writing your letter in a format familiar to your friend, you can be confident that your message will be conveyedthat is, you will have transferred your thoughts (data/information) to the letter's recipient.


You can use XML in much the same wayas a format for conveying information. For example, if you want to send data out of Flash for processing by a Web server, you format that data as XML. The server then interprets the XML-formatted data and uses it in the manner intended. Without XML, you could send chunks of data to a server, but the server probably wouldn't know what to do with the first chunk or the second, or even how the first chunk related to the second. XML gives meaning to these disparate bits of data so the server can work with them in an organized and intelligent manner.

XML's simple syntax resembles HTML in that it employs tags, attributes, and valuesbut the similarity ends there. Where HTML uses predefined tags (for example, <body>, <head>, and <html>), in XML you create your own tagsthat is, you don't pull them from an existing library of tag names. Look at the following simple XML document:


  <Name Gender="female">Kelly Makar</Name>

  <Name Gender="male">Mike Grundvig</Name>

  <Name Gender="male">Free Makar</Name>


Each complete tag (such as <Name></Name>) in XML is called a node, and any XML-formatted data is called an XML document. Each XML document can contain only one root node; the document just shown has a root node called MyFriends, which in turn has three child nodes. The first child node has a node name of Name and a node value of Kelly Makar. The word Gender in each child node is an attribute. Attributes are optional, and each node can have an unlimited number of attributes. You'll typically use attributes to store small bits of information that are not necessarily displayed onscreenfor example, a user identification number.


The tags in this example (which we made up and defined) give meaning to the bits of information shown (Kelly Makar, Mike Grundvig, and Free Makar).

The next XML document shows a more extended use of XML:



    <Name>Kelly Makar</Name>

    <Street>121 Baker Street</Street>

    <City>Some City</City>

    <State>North Carolina</State>



    <Name>Tripp Carter</Name>

    <Street>777 Another Street</Street>

    <City>Elizabeth City</City>

    <State>North Carolina</State>



This example shows how the data in an address book would be formatted as XML. If there were 600 people listed in the address book, the Person node would appear 600 times with the same structure.

So how do you create your own nodes and structure? How does the destination (ASP page, socket, and so on) know how the document is formatted? And how does it know what to do with each piece of information? The simple answer is that this intelligence has to be built into your destination. Thus, if you were planning to build an address book in Flash and wanted the information it contained to be saved in a database, you would send an XML-formatted version of that data to an ASP page (or another scripted page of choice), which would then parse that information and insert it into the appropriate fields in a database. The important thing to remember is that the ASP page must be designed to deal with data in this way. Because XML is typically used to transfer rather than store information, the address book data would be stored as disparate information in database fields, rather than stored as XML. When needed again, that information could be extracted from the database, formatted as XML by a scripted page, and sent along to Flash or any other application that requested it.


Web pages often use text files that contain XML-formatted informationfor example, a static XML file for storing information about which ASP pages to call, or what port and IP to connect to when attempting to connect with a socket server.

Now that you know the basics of the XML format, here are some rules you need to follow when you begin using it:

  • You cannot begin node names with the letters XML; many XML parsers break when they see XML at the beginning of a node name.

  • You must properly terminate every nodefor example, you would terminate <Name> with </Name>. The slash (/) inside the final tag indicates that a node is completed (terminated).

  • You must URL-encode all special characterswhich you can do by using the escape() function in Flash. Many parsers interpret certain unencoded characters as the start of a new node that is not terminated properly (because it wasn't a node in the first place). An XML document with non-terminated nodes won't pass through an XML parser completely. Attributes are less forgiving than text nodes because they can fail to pass through the parser on characters such as a carriage return or an ampersand. If you URL-encode the text, you won't experience this trouble.

  • Most XML parsers are case sensitive, which means that all tags of the same type must have the same case. If you start a node with <Name> and terminate it with </name>, you're asking for trouble.

  • You can have only one root node.

One more thing to note before you begin working with XML is that the clean XML structure shown in these examples is not necessary. The carriage returns and tabs are there to make it easier for us to read. These tabs and carriage returns are called white space, and you can add or delete white space without affecting the overall structure.