Location paths are a subset of a more general concept called XPath expressions. These are statements that can extract useful information from the tree. Instead of just finding nodes, you can count them, add up numeric values, compare strings, and more. They are much like statements in a functional programming language. There are five types, listed here:
An expression type with two possible values, true and false.
A collection of nodes that match an expression's criteria, usually derived with a location path.
A numeric value, useful for counting nodes and performing simple arithmetic.
A fragment of text that may be from the input tree, processed or augmented with generated text.
A temporary node tree that has its own root node but cannot be indexed into using location paths.
In XPath, types are determined by context. An operator or function can transform one expression type into another as needed. For this reason, there are well-defined rules to determine what values map to when transformed to another type.
XPath has a rich set of operators and functions for working with each expression type. In the following sections, I will describe these and the rules for switching between types.
Boolean expressions have two values: true or false. As you saw with location step predicates, anything inside the brackets that does not result in a numerical value is forced into a Boolean context. There are other ways to coerce an expression to behave as Boolean. The function boolean( ) derives a true or false value from its argument. There are also various operators that combine and compare expressions with a Boolean result.
The value derived from an expression depends on some rules listed in Table 6-6.
Expression type |
Rule |
---|---|
Node set |
True if the set contains at least one node, false if it is empty. |
String |
True unless the string is zero-length. |
Number |
True unless the value is zero or NaN (not a number). |
Result tree fragment |
Always true, because every fragment contains at least one node, its root node. |
Certain operators (listed in Table 6-7) compare numerical values to arrive at a Boolean value. These are existential comparisons, meaning that they test all the nodes in a node set to determine whether any of them satisfies the comparison.
Operator |
Returns |
---|---|
expr = expr |
True if both expressions (string or numeric) have the same value, otherwise false. |
expr != expr |
True if the expressions do not have the same value (string or numeric), otherwise false. |
expr < expr[2] |
True if the value of the first numeric expression is less than the value of the second, otherwise false. |
expr > expr[2] |
True if the value of the first numeric expression is greater than the value of the second, otherwise false. |
expr <= expr[2] |
True if the value of the first numeric expression is less than or equal to the value of the second, otherwise false. |
expr >= expr[2] |
True if the value of the first numeric expression is greater than or equal to the value of the second, otherwise false. |
[2] If you use these operators inside an XML document such as an XSLT stylesheet or a Schematron schema, you must use character references < and > instead of < and >.
Listed in Table 6-8 are functions that return Boolean values.
Function |
Returns |
---|---|
expr and expr |
True if both Boolean expressions are true, otherwise false. |
expr or expr |
True if at least one Boolean expression is true, otherwise false. |
true( ) |
True. |
false( ) |
False. |
not( expr ) |
Negates the value of the Boolean expression: true if the expression is false, otherwise false. |
A node set expression is really the same thing as a location path. The expression evaluates to a set of nodes. This is a set in the strict mathematical sense, meaning that it contains no duplicates. The same node can be added many times, but the set will always contain only one copy of it.
XPath defines a number of functions that operate on node sets, listed in Table 6-9.
Function |
Returns |
---|---|
count( node set ) |
The number of items in node set. For example, count(parent::*) will return the value 0 if the context node is the document element. Otherwise, it will return 1, since a node can only have one parent. |
generate-id( node set ) |
A string containing a unique identifier for the first node in node set, or for the context node if the argument is left out. This string is generated by the processor and guaranteed to be unique for each node. |
last( ) |
The number of the last node in the context node set. last( ) is similar to count( ) except that it operates only on the context node set, not on an arbitrary set. |
local-name( node set ) |
The name of the first node in node set, without the namespace prefix. Without an argument, it returns the local name of the context node. |
name( node set ) |
The name of the first node in node set including the namespace prefix. |
namespace-uri( node set ) |
The URI of the namespace for the first node in node set. Without an argument, it returns the namespace URI for the context node. |
position( ) |
The number of the context node in the context node set. |
There are also functions that create node sets, pulling together nodes from all over the document. For example, the function id( string ) returns the set of elements that have an ID attribute equal to the value of string, or an empty set if no node matches. In a valid document, only one node should be returned, because the ID type attribute must have a unique value. XPath does not require documents to be valid, however, so it is possible that more than one element will be returned.
XPath allows an expression to be evaluated numerically, which is useful for comparing positions in a set, adding the values of numeric elements, incrementing counters, and so forth. A number in XPath is defined to be a 64-bit floating-point number (whether it has a decimal point or not). Alternatively, a number can be specified as NaN (not a number), in case a conversion fails.
The rules for converting any expression into a numeric value are listed in Table 6-10.
Expression type |
Rule |
---|---|
Node set |
The first node is converted into a string, then the string conversion rule is used. |
Boolean |
The value true is converted to the number 1, and false to the number 0. |
String |
If the string is the literal serialization of a number (i.e., -123.5), it is converted into that number. Otherwise, the value NaN is used. |
Result-tree fragment |
Like node sets, a result-tree fragment is converted into a string, which is then converted with the string rule. |
To manipulate numeric values, there are a variety of operators and functions. These are cataloged in Table 6-11.
Function |
Returns |
---|---|
expr + expr |
The sum of two numeric expressions. |
expr - expr |
The difference of the first numeric expression minus the second. |
expr * expr |
The product of two numeric expressions. |
expr div expr |
The first numeric expression divided by the second expression. |
expr mod expr |
The first numeric expression modulo the second expression. |
round( expr ) |
The value of the expression rounded to the nearest integer. |
floor( expr ) |
The value of the expression rounded down to an integer value. |
ceiling( expr ) |
The value of the expression rounded up to an integer value. |
sum( node-set ) |
The sum of the values of the nodes in node-set. Unlike the other functions in this table, this function operates over a node set instead of expressions. |
A string is a segment of character data, such as "How are you?", "990", or "z". Any expression can be converted into a string using the string( ) function following the rules in Table 6-12.
Expression type |
Rule |
---|---|
Node set |
The text value of the first node is used as the string. |
Boolean |
The string is true if the expression is true, otherwise false. |
Number |
The string value is the number as it would be printed. For example, string(1 + 5 - 9) evaluates to the string -3. |
Result-tree fragment |
The string value is the concatenation of the text values of all the nodes in the fragment. |
Functions that return string values are listed in Table 6-13.
Function |
Returns |
---|---|
concat( string, string, . . . ) |
A string that is the concatenation of the string arguments. |
format-number( number, pattern, decimal-format ) |
A string containing the number, formatted according to pattern. The optional decimal-format argument points to a format declaration which assigns special characters like the grouping character, which separates groups of digits in large numbers for readability. In XSLT, this format declaration would be the value of the name attribute in a decimal-format element. |
normalize-space( string ) |
The string with leading and trailing whitespace removed, and all other strings of whitespace characters replaced with single spaces. The value of the context node is used if the argument is left out. |
substring( string, offset, range ) |
A substring of the string argument, starting offset characters from the beginning and ending range characters from the offset. |
substring-after( string, to-match ) |
A substring of the string argument, starting at the end of the first occurrence of the string to-match and ending at the end of string. |
substring-before( string, to-match ) |
A substring of the string argument, starting at the beginning of string and ending at the beginning of the first occurrence of the string to-match. |
translate( string, characters-to-match, characters-replace-with ) |
The string with all characters in the string characters-to-match replaced with their counterpart characters in the string characters-replace-with. Suppose the first argument is the string "happy days are here again.", the second argument is "ah." and the third string is "oH!". The returned result will be the string "Hoppy doys ore Here ogoin!". translate( ) only works on a per-character basis, so you can not replace arbitrary strings with it. |
Some functions operate on strings and return numeric or Boolean values. These are listed in Table 6-14.
Function |
Returns |
---|---|
contains( string, sub ) |
True if the substring sub occurs within the string, otherwise false. |
starts-with( string, sub ) |
True if the string begins with the substring sub, otherwise false. |
string-length( string ) |
The number of characters inside string. |