Chapter: 8.7 Querying the Repository

In this section, we will describe how to query the XML repository.
8.7.1 Ad Hoc SQL Queries
Once we have uploaded some documents into the database, we can start experimenting with queries. For example, the query in Listing 8.68 will return a list of the element names that appear, with their x and y coordinates and namespace prefixes (if any).
Listing 8.68 SQL Query
SELECT n.x, n.y, e.ns_prefix, e.local_name FROM element_name e, node n WHERE n.doc_id = 1 AND n.node_id = e.node_id;
The results for wml-example.xml (document ID = 1, if we loaded this document first) are shown in Listing 8.69.
Listing 8.69 Results for wml-example.xml
x y ns_prefix local_name ---- ---- --------- ---------- 2 49 NULL wml 5 34 NULL card 8 19 NULL p 11 12 NULL img 15 16 NULL br 22 31 NULL p 25 28 NULL a 37 46 NULL card 40 43 NULL p (9 row(s) affected)
A variation on this theme is shown in Listing 8.70, where we return the element names (qualified with a namespace prefix, if any) with a count of their occurrences in the specified document (note the use of a group by expression).
Listing 8.70 Modified Query
SELECT ISNULL(e.ns_prefix + ':', '') + e.local_name AS qName,
COUNT(1) AS occurrences
FROM element_name e, node n
WHERE n.doc_id = 1 AND n.node_id = e.node_id
GROUP BY ISNULL(e.ns_prefix + ':', '') + e.local_name;
The results are shown in Listing 8.71.
Listing 8.71 Results for Modified Query
qName occurrences -------- ----------- a 1 br 1 card 2 img 1 p 3 wml 1 (6 row(s) affected)
For our simple example, the results are pretty obvious; there are two cards, one link, three paragraphs, and so on. However, for larger documents, the query can be very informative; for example, in an XHTML document, we can quickly determine how many paragraph elements there are, or how many images.
The query in Listing 8.72 will return a list of the attribute names that appear, with their x and y coordinates and namespace prefixes (if any).
Listing 8.72 More Informative Query
SELECT n.x, n.y, a.attribute_id, a.ns_prefix, a.local_name FROM attribute_name a, node n WHERE n.doc_id = 1 AND n.node_id = a.node_id;
The results for our example are shown in Listing 8.73.
Listing 8.73 Results for More Informative Query
x y attribute_id ns_prefix local_name ---- ---- ------------ --------- ---------- 5 34 1 NULL id 5 34 2 NULL title 5 34 3 NULL newcontext 8 19 1 NULL align 11 12 1 NULL src 11 12 2 NULL alt 11 12 3 NULL align 25 28 1 NULL href 37 46 1 NULL id 37 46 2 NULL title 40 43 1 NULL align (11 row(s) affected)
To retrieve the text leaves for this document, run the query shown in Listing 8.74.
Listing 8.74 Text Leaves Query
SELECT n.x, t.leaf_id, t.leaf_text FROM text_leaf t, node n WHERE n.doc_id = 1 AND n.node_id = t.node_id
The results for our example are shown in Listing 8.75 (note that I've ignored whitespace nodes for brevity?there are 14 whitespace entries in addition to these three rows).
Listing 8.75 Results for Text Leaves Query
x leaf_id leaf_text ---- ------- --------------------------- 17 1 Content of the first card. 26 1 Next card. 41 1 Content of the second card.
To return the text content of a specific element node (e.g., the second card element), see Listing 8.76.
Listing 8.76 Text Content Query
SELECT n.x, t.leaf_text FROM text_leaf t, node n WHERE n.doc_id = 1 AND n.node_id = t.node_id AND n.x BETWEEN 37 AND 46;
where 37 and 46 are the x and y coordinates of the element node. The results are shown in Listing 8.77 (I've ignored whitespace again).
Listing 8.77 Results for Text Content Query
x leaf_id leaf_text ---- ------- --------------------------- 41 1 Content of the second card.
We can generalize the last query to return, for a given document, all text nodes of elements with a specified name (in this case, all <a> elements), as shown in Listing 8.78.
Listing 8.78 Generalized Query
SELECT n2.x, n2.y, e.local_name, t.leaf_id, t.leaf_text FROM node n1, node n2, element_name e, text_leaf t WHERE n1.doc_id = 1 AND n1.node_id = e.node_id AND e.local_name = 'a' AND n2.doc_id = 1 AND n2.node_id = t.node_id AND n2.x BETWEEN n1.x AND n1.y;
In our example, there is only one <a> element, as shown in Listing 8.79.
Listing 8.79 Generalized Query Result
x y local_name leaf_id leaf_text --- --- ---------- ------- --------- 26 27 a 1 Next card (1 row(s) affected)
The queries in this section represent just a tiny sample of the queries we can dream up now that we have parsed XML into our relational database.
8.7.2 Searching for Text
The reader may have noticed a potential pitfall for applications that need to provide full text searching capabilities for document-centric XML. If we intend to run queries on large text nodes?for example, to find all text nodes that contain a certain word or words?we need to be mindful that the content may well be split over leaves. One solution would be to have the leaf_text column in our text_leaf table of type text rather than a varchar (this would also require some changes to the stored procedures that access this table, since the syntax for handling this type of content will differ). Another solution would be to build an indexing mechanism that creates a full text index at parse time, with references to the nodes that contain the text. Either approach will allow searches to weight hits according to information about the containing nodes (e.g., a word found in text content of a title element might be worth more than the same text found in a paragraph element).
8.7.3 Some More Stored Procedures
Time to introduce some more stored procedures that may prove useful in a number of applications. The first, rep_s_parent, selects the parent node of a specified node (identified by a document ID and x coordinate). This procedure makes use of the Nested Sets property that the parent of a given node (x, y) has the largest x' in the set defined by x' , x . y' (see Listing 8.80).
Listing 8.80 Create rep_s_parent Procedure
CREATE PROCEDURE dbo.rep_s_parent
@doc_id t_doc_id,
@x_child t_xy_index AS
SELECT n1.node_id,
n1.x,
n1.y,
n1.node_type_id
FROM node n1
WHERE n1.doc_id = @doc_id
AND n1.x = (SELECT MAX(n2.x)
FROM node n2
WHERE n2.doc_id = @doc_id
AND n2.x < @x_child
AND n2.y > @x_child)
GO
The next procedure, rep_s_children, shown in Listing 8.81 returns the child nodes of a specified node (again identified by a document ID and x coordinate). Descendants are easy (they are identified by x, x', y; we have already exploited this property in rep_serialise_nodes), but immediate children are a bit harder to determine; the trick here is a correlated subquery that limits the results of the first section of the WHERE clause (the set of all descendants) to just those for which the specified node is the parent.
Listing 8.81 Create rep_s_children Procedure
CREATE PROCEDURE dbo.rep_s_children
@doc_id t_doc_id,
@x_source t_xy_index AS
SELECT n1.node_id,
n1.x,
n1.y,
n1.node_type_id
FROM node n1
WHERE n1.doc_id = @doc_id
AND n1.x > @x_source
AND n1.y < (SELECT y
FROM node
WHERE doc_id = @doc_id
AND x = @x_source)
AND @x_source = (SELECT MAX(x)
FROM node n2
WHERE n2.doc_id = @doc_id
AND n2.x < n1.x
AND n2.y > n1.y)
ORDER BY n1.x
GO
For our example document, we can (for example) ask for the parent node of the <p> element with x = 40, which produces the result shown in Listing 8.82.
Listing 8.82 Result for rep_s_parent with Two Parameters
rep_s_parent 2, 40 GO node_id x y node_type_id ------- --- --- ------------ 52 37 46 1 (1 row(s) affected)
As expected, this is the x coordinate of the second <card> element.
8.7.4 Generating XPath Expressions
The W3C's XPath recommendation provides a very useful syntax for querying XML documents. While a full implementation of XPath is beyond the scope of this chapter, we can at least make a start. We will create a stored procedure rep_gen_xpath (see Listing 8.83) that returns an XPath expression for the path between two nodes (identified by their x values: x_source and x_target) in a specified document. There is no error checking, and we need to specify valid x coordinates for two element nodes, or for the document node (x = 1) and an element node.
First we need to select (into a temporary table) the data needed to construct the path expression, and for each element in the path, we may also need a sequence number (if there is more than one element with the same name at the same level in the fragment).
Listing 8.83 Create rep_gen_xpath Procedure
CREATE PROCEDURE dbo.rep_gen_xpath
@doc_id t_doc_id,
@x_source t_xy_index,
@x_target t_xy_index
AS
BEGIN
-- Returns an XPath expression for a given target element node (starting at the source
element node)
-- Local variables
DECLARE @seq_no INTEGER
DECLARE @this_element t_leaf_text
DECLARE @this_ns_prefix t_leaf_text
DECLARE @xpath_expr t_xpath_maxlen
-- Build a temporary table
SELECT n1.x,
n1.y,
(SELECT COUNT(1) + 1
FROM node n3,
element_name en2
WHERE n3.doc_id = @doc_id
AND n3.node_type_id = 1
AND (SELECT MAX(n4.x)
FROM node n4
WHERE n4.doc_id = @doc_id
AND n4.node_type_id IN (1, 9)
AND n4.x < n3.x
AND n4.y > n3.x) = (SELECT MAX(n5.x)
FROM node n5
WHERE n5.doc_id = @doc_id
AND n5.node_type_id IN (1, 9)
AND n5.x < n1.x
AND n5.y > n1.x)
AND n3.node_id = en2.node_id
AND en2.local_name = en1.local_name
AND n3.x < n1.x) AS seq_no,
(SELECT COUNT(1)
FROM node n6,
element_name en3
WHERE n6.doc_id = @doc_id
AND n6.node_type_id = 1
AND (SELECT MAX(n7.x)
FROM node n7
WHERE n7.doc_id = @doc_id
AND n7.node_type_id IN (1, 9)
AND n7.x < n6.x
AND n7.y > n6.x) = (SELECT MAX(n8.x)
FROM node n8
WHERE n8.doc_id = @doc_id
AND n8.node_type_id IN (1, 9)
AND n8.x < n1.x
AND n8.y > n1.x)
AND n6.node_id = en3.node_id
AND en3.local_name = en1.local_name) AS total_count,
en1.ns_prefix,
en1.local_name
INTO #results
FROM node n1,
element_name en1
WHERE n1.doc_id = @doc_id
AND n1.x <= @x_target
AND n1.y > @x_target
AND n1.x >= @x_source
AND n1.node_type_id = 1
AND n1.node_id = en1.node_id
ORDER BY n1.x
-- Null sequence numbers if element is the only one of its type
UPDATE #results
SET seq_no = NULL
WHERE seq_no = 1
AND total_count = 1
Next we create a cursor to walk over the results set, building the XPath expression as we step through the rows, as shown in Listing 8.84.
Listing 8.84 Cursor to Walk the Results Set
-- Initialize XPath expr (as a slash if we are starting from
-- the root element or the document node)
IF @x_source <= (SELECT MIN(x) FROM node
WHERE doc_id = @doc_id AND node_type_id = 1)
SELECT @xpath_expr = '/'
ELSE
SELECT @xpath_expr = ''
-- Create a cursor to walk through #results in depth order
DECLARE xpath_generator CURSOR LOCAL FORWARD_ONLY STATIC READ_ONLY FOR
SELECT seq_no, ns_prefix, local_name FROM #results ORDER BY x
OPEN xpath_generator
-- Loop over the entries
FETCH NEXT FROM xpath_generator
INTO @seq_no, @this_ns_prefix, @this_element
WHILE @@FETCH_STATUS = 0
BEGIN
-- Append the element name, seq no (if required) and a slash
SELECT @xpath_expr = @xpath_expr
+ ISNULL((@this_ns_prefix + ':'), '')
+ @this_element + ISNULL('[' + CONVERT(VARCHAR(10), @seq_no)
+ ']', '') + '/'
-- Continue
FETCH NEXT FROM xpath_generator
INTO @seq_no, @this_ns_prefix, @this_element
END
-- Remove the final slash
SELECT @xpath_expr = LEFT(@xpath_expr, DATALENGTH(@xpath_expr) - 1)
-- Tidy up
CLOSE xpath_generator
DEALLOCATE xpath_generator
DROP TABLE #results
-- Return the XPath expression
SELECT 'XPath(' + CONVERT(VARCHAR(4), @doc_id) + ', '
+ CONVERT(VARCHAR(4), @x_source) + ', '
+ CONVERT(VARCHAR(4), @x_target) + ')' AS parameter,
RTRIM(@xpath_expr) AS value
END
GO
We can execute this procedure for our example document. For example, to obtain the path between the document node and the <p> element of the second <card> element would produce the result shown in Listing 8.85.
Listing 8.85 Result for rep_gen_xpath with Three Parameters
rep_gen_xpath 2, 1, 40 GO parameter value --------------- -------------- XPath(2, 1, 40) /wml/card[2]/p (1 row(s) affected)
Note the sequence number 2 for the <card> element ("/wml/card/p" would match more than one node; the sequence number allows us to be specific).
Top
:: анонимно отправить смс
|








