2.6 Calculated Import of XML

You can create scripts to parse XML to read the data into FileMaker Pro. The process is very similar to the calculated export of XML, only in reverse order. The first priority is getting the text of the XML document into a field. Remembering the field size limit, you may want to read smaller portions of the document with a file plug-in. The second priority is analyzing the structure of the XML document to see where there might be related data. See the example in Listing 2.14 and decide if you will be parsing the entire document into a flat database or into multiple related databases. If the elements in the XML document repeat, they probably should become separate records whether related or not. The next two sections show you some options for parsing (reading) the XML elements and getting the contents of an XML document.

2.61 Troi-Text Plug-in

An easy way to look at the structure of XML documents is to use the Troi-Text plug-in. There are specific external functions that will help you parse the element paths (or nodes) of the XML document. You can read more about the Troi-Text plug-in at http://www.troi.com/. One of the functions of this plug-in, External("TrText-XML"), has two parameters that can be used to get the contents of a node (elements path) and attributes of the element.

External("TrText-XML", "-getnode|node|XMLsource")
External("TrText-XML", "-getattributes|node|XMLsource")

The XMLsource is the XML document or fragment of an XML document. The plug-in reads the XML source in a field or as a literal or a calculated value. The node can be entered as an XPath expression starting from the root element, such as root/parent/child[3]/child. The expression can be read from a field, or as a literal or calculated value. Each path element is separated by a "/" and multiple occurrences of an element can be extracted by using the XPath predicate "[n]".

<!-- XPath Expression for node -->
FMPXMLRESULT/RESULTSET/ROW[3]

Using the -getnode parameter for the above node would return the entire set of <COL> and <DATA> elements for the third ROW element. This would be the data for the third record. The attributes returned, using the -getattributes parameter, would be RECORDID="nnn" and MODID="nnn" for the third ROW.

2.62 Calculated Parsing of XML

The elements in an XML document have a pattern of "<" and elementName at the beginning of a node. The end of the node is always "/>" for an empty element and "</", the elementName, and ">" for elements with or without content (data or other elements). We can use these patterns and native FileMaker Pro functions to parse XML documents.

First determine how many occurrences of a starting element are in the document. The function PatternCount(text, string) will return the number of times a string pattern occurs in some text. The string parameter in PatternCount() will be counted regardless of case or where the pattern occurs within a word. Use the XML in Listing 2.14 and search for "customer"; the results of the PatternCount() function are shown here:

PatternCount(XMLdoc, "customer") -> 6
PatternCount(XMLdoc, "CUSTOMER") -> 6
PatternCount(XMLdoc, "customers") -> 2
PatternCount(XMLdoc, "item") -> 10

PatternCount is just looking for a pattern. The element names will appear in the start tag and end tag or empty tag. You must work with a full word and a space, "/", or ">" to count the number of times a starting element occurs in the XML document. Valid starting elements can be <elementName>, <elementName attribute="">, <element-Name/>, <elementName attribute=""/>, <elementName />, and <elementName attribute="" />. PatternCount() will still not distinguish between <ELEMENT> and <element>, but for our needs, the calculation is sufficient:

elementCount = PatternCount(XMLdoc, "<" & elementName &"")+
  PatternCount(XMLdoc, "<" & elementName & ">") + PatternCount(XMLdoc,
  "<" & elementname & "/")

This would be the same as the XPath expression:

count(//elementName)

Next determine the starting position of the element. The FileMaker Pro function Position() uses the parameter for the text to search (XMLdoc), the pattern of the search string ("<" & elementName), the character to start the search (1), and the occurrence of the search string from the start (Predicate). Position() also does not give a different result for the case of the search string; for example, Position(text, "string", 1, 1) is the same as Position(text, "STRING", 1, 1). We will be using the starting position regardless of attributes in an element or whether it is empty. Search for the pattern "<" and elementName, based upon the occurrence found in the Predicate number field.

elementStart = Position(XMLdoc, "<" & elementName, 1, Predicate)

We can revise the above calculation to account for the space (""), slash ("/"), or greater-than (">") characters that will appear after an element. Calculate each of these possibilities and add them together, as only one will match the element:

elementSpace = Position(XMLdoc, "<" & elementName & "",1, Predicate)
elementSlash = Position(XMLdoc, "<" & elementName & "/", 1, Predicate)
elementGreaterThan = Position(XMLdoc, "<" & elementName & ">", 1,
  Predicate)
elementStart = elementSpace + elementSlash + elementGreaterThan

Determine the ending position of the element and whether it is an empty element or not, based upon the starting position of the element. The Case() function is used to test for an element end tag ("</" & elementName & ">") or the default of the first occurrence of "/>" after the element name (as in an empty element). If the element has an end tag, the end position for the node becomes the start of the end tag plus the length of the end tag. If the element is empty, the end position is after the "/>" for that element.

elementEnd = Case(PatternCount(XMLdoc, "</" & elementName & ">"),
Position(XMLdoc, "</" & elementName & ">", 1, Predicate) +
Length("</" & elementName & ">"),
Position(XMLdoc, "/>", cElementStart, 1) + 2)

Finally, we use the Middle() function to extract the element. Test first for an empty Predicate field. Verify that the number in the Predicate field is really greater than or equal to the elementCount. If you ask for element[3] and there are only two elements, you will get no results. If both tests fail, the default text result is empty ("").

Case(IsEmpty(Predicate), "", elementCount >= Predicate, Middle(XMLdoc,
  ElementStart, ElementEnd - ElementStart),"")

The attributes can be extracted with the calculation below. You can further refine and parse the names of the attributes and each of the values. (Hint: The attributes always are spaced and have "=" between the name and value pairs with the values in double or single quotes.)

Trim(
Substitute(Substitute(
Middle(cElementNode, Position(cElementNode, "<" & elementName, 1, 1) +
  Length("<" & elementName), Position(cElementNode, ">", 1, 1) -
  Length("<" & elementName)),
"/>", ""), ">", "")
)

Using calculated parsing, the XPath expression "//item[2]" would return the attribute (id="2") and the following:

<item id="2">
<qty>2</qty>
<description>Trucks</description>
<color>Red</color>
<price>5.15</price>
</item>

This section only has a small sampling of the possibilities with parsing in FileMaker Pro. Using plug-ins and/or built-in functions, you can manipulate text formatted as XML. The FileMaker Pro functions are used to transform the XML into other formats. Some of these examples also may help you understand XSL and how it transforms the XML into other text formats. XSL transformation uses the Xerces processor in FileMaker Pro. You can read more about XSL in Chapter 7.