The RDQL language is based on the earlier work of Guha's RDFDB QL and SquishQL, with some relatively minor differences. Its popularity is ensured because of its use within Jena, probably the most widely used RDF API.
RDQL supports the different clauses of select, from, where, and using (with some exceptions) as SquishQL. Additionally, RDQL can change based on the implementation and whether you're using a Java API such as Jena, a PHP class such as the PHP XML classes, or a Perl module such as RDFStore. However, though the syntax varies within the clauses, the concepts remain the same.
Variables are in the format of a question mark, followed by other characters, just as in SquishQL:
?<identifier>
However, one difference between SquishQL and RDQL occurs in the select clause, which requires commas rather than spaces to separate all variables.
The from, or source, clause, can be omitted with RDQL depending on the implementation. For instance, in Jena, the source of the RDF/XML can be specified and loaded separately through a separate class method or can be given directly in the query. However, in the PHP RDF/XML classes, the from clause must be provided within the query. The same applies to RDFStore, which also requires that the URL be surrounded by angle brackets.
The where clause (or triple pattern clause) differs in that the pattern follows the more traditional subject-predicate-object ordering, and URIs are differentiated from literals by being surrounded by angle brackets. However, the way that triple patterns are combined to form more complex queries is the same in RDQL and SquishQL.
RDQL has greater sophistication in incorporating comparison semantics with the triple pattern within the constrain clause. The use of AND is the same, but other operators ? such as the OR operator (|||), bitwise operators (& and |), and negation (!) ? are supported.
Within Jena, there is no using clause because the namespaces for the resources are included with the resource rather than being listed as a separate namespace. However, the PHP XML classes support using, as does RDFStore.
In addition to the rich set of Java classes that allow access to individual triples as well as the ability to build complex RDF/XML documents (as described in Chapter 8) Jena also provides specialized classes for use with RDQL:
The Query class manages the actual query, enabling the building of a query through an API or passed as a string.
Query engine interface.
The actual execution of the query (the intelligence behind the query process).
The iterator that manages the results.
Mapping from variables to values.
In addition to these standard classes, newer implementations of Jena also support some newer classes, such as a QueryEngineSesame class, which works against the Sesame RDF repository (discussed at the end of the chapter).
The use of the classes is very straightforward. Use Query to build or parse the query, which is then passed to QueryEngine for processing. The results are returned to the QueryExecution class, which provides methods to access the results, which are assigned to QueryResults. To access individual items in the results, the data is bound to program variables using ResultsBinding.
To demonstrate how Jena works with RDQL, I created a dynamic query application, which I call the Query-O-Matic, building it in Java as a Tomcat JSP application.
The Query-O-Matic is a two-page application, with the first HTML page containing a form and the second JSP page processing the form contents. It's built using Jena 1.6, and managed with Tomcat. The source code is included as part of the example code for the book.
|
To create the application, the Jena .jar files must be copied to the common library or to the application-specific WEB-INF lib directory. I copied them to the common library location because I use Jena for several applications.
The first page is nothing special, an HTML form with three fields:
The first field is a text input field to hold the URL of the RDF/XML document.
The second field is a textarea to hold the actual query.
The third field is another text input file to hold the variable that's printed out.
Figure 10-2 shows the page containing the form, as well as links to sample RDF/XML documents.
In the JSP page, the form values are pulled from the HTTP request. The URL is used to load the document; once it is loaded, the query is run against the document using the Jena QueryEngine class. To iterate through the results, another class, QueryResults, is created, and each record returned from the query is then bound to a specific object, in order to access a specific value. The result value that's passed from the form is polled from the object and the value is printed out, as shown in Example 10-3. Once all values are processed, the result set is closed.
<html> <%@ page import="com.hp.hpl.mesa.rdf.jena.mem.*, java.io.File, java.util.*, com.hp.hpl.mesa.rdf.jena.model.*, com.hp.hpl.mesa.rdf.jena.common.*, com.hp.hpl.jena.util.*, com.hp.hpl.jena.rdf.query.*, com.hp.hpl.jena.rdf.query.parser.*" %> <body> <% ModelMem model; try { model = new ModelMem( ); String sUri = request.getParameter("uri"); String sQuery = request.getParameter("query"); String sResult = request.getParameter("result"); model.read(sUri); // query string Query query = new Query(sQuery); query.setSource(model); QueryExecution qe = new QueryEngine(query) ; QueryResults results = qe.exec( ); out.print("<h1>test</h1>"); for ( Iterator iter2 = results ; iter2.hasNext( ) ; ) { ResultBinding env = (ResultBinding)iter2.next( ) ; Object obj = env.get(sResult); out.print(obj.toString( )); out.print("<br>"); } // close results results.close( ) ; } catch (Exception e) { out.print(e.toString( )); } %> <br> </font> </body> </html>
Once the two pages and supporting Jena .jar files are installed into Tomcat, we're ready to try out some RDQL in the Query-O-Matic.
The simplest test of the Query-O-Matic is to run an RDQL variation of the first query made with Inkling/SquishQL, which is to find all the dc:subject predicates in the RDF/XML document and print out the associated object values. The contents of the form are given in Example 10-4.
uri: http://burningbird.net/articles/monsters1.rdf query: SELECT ?subject WHERE (?x, <dc:subject>, ?subject) USING dc FOR <http://purl.org/dc/elements/1.1/> result: subject
Comparing this with the SquishQL example shows that both are basically the same with minor syntactic differences. When the form is submitted and the query processed, the results returned are exactly the same, too.
Another slightly more complicated query is shown in Example 10-5, which demonstrates traversing two arcs in order to find a specific value.
SELECT ?value WHERE (?resource, <rdf:type>, <pstcn:Movement>), (?resource, <pstcn:reason>, ?value) USING pstcn FOR<http://burningbird.net/postcon/elements/1.0/>, rdf FOR <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
Notice that object values that are resources are treated the same as the subject and predicate values, with angle brackets around the URI (or the QName). The only type of value that doesn't have angle brackets is literals.
A slightly more complicated query more fully demonstrates the filtering capability of the triple pattern. To better understand how this query works, take a look at the N-Triples of the statements of the subgraph from the monsters1.rdf example:
<http://burningbird.net/articles/monsters1.htm> <http://www.w3.org/1999/02/22-rdf- syntax-ns#type> <http://burningbird.net/postcon/elements/1.0/Resource> . <http://burningbird.net/articles/monsters1.htm> <http://burningbird.net/postcon/ elements/1.0/presentation> _:jARP10030 . _:jARP10030 <http://burningbird.net/postcon/elements/1.0/requires> _:jARP10032 . _:jARP10032 <http://burningbird.net/postcon/elements/1.0/type> "logo" . _:jARP10032 <http://www.w3.org/1999/02/22-rdf-syntax-ns#value> "http://burningbird. net/mm/dynamicearth.jpg" . _:jARP10030 <http://burningbird.net/postcon/elements/1.0/requires> _:jARP10031 . _:jARP10031 <http://burningbird.net/postcon/elements/1.0/type> "stylesheet" . _:jARP10031 <http://www.w3.org/1999/02/22-rdf-syntax-ns#value> "http://burningbird. net/de.css" .
These are the statements we'll be querying with the code shown in Example 10-6. Within the query, the pstcn:presentation arc is followed from the main resource (monsters1.htm) to get the object/resource for it (a blank node). Then, the pstcn:requires predicate arc is followed to get the two required presentation bnodes. However, we're interested only in the one whose pstcn:type is "stylesheet". Once we have that, then we'll access the value of the stylesheet. The path I just highlighted in the text is also highlighted in the example.
SELECT ?value WHERE (?x, <pstcn:presentation>, ?resource), (?resource, <pstcn:requires>, ?resource2), (?resource2, <pstcn:type>, "stylesheet"), (?resource2, <rdf:value>, ?value) USING pstcn FOR <http://burningbird.net/postcon/elements/1.0/>, rdf FOR <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
The result from running this query is:
http://burningbird.net/de.css
Exactly what we wanted to get.
I used a triple pattern to find the specific required presentation resource, rather than a conditional filter, because I wasn't going to be querying among the end values?I'm actually modifying the query within the path to the end statement. If I wanted to find specific values using a conditional filter, I would list triple patterns up until I returned all of the statements of interest and then use the filter on these statements to find specific values.
A demonstration of this is shown in Example 10-7, where a date is returned for a movement with movement type of "Add". Notice that equality is denoted by the eq operator rather than using nonalphabetic characters such as ==, common in several programming languages.
SELECT ?date WHERE (?resource, <rdf:type>, <pstcn:Movement>), (?resource, <pstcn:movementType>, ?value), (?resource, <dc:date>, ?date) AND (?value eq "Add") USING pstcn FOR <http://burningbird.net/postcon/elements/1.0/>, rdf FOR <http://www.w3.org/1999/02/22-rdf-syntax-ns#>, dc for <http://purl.org/dc/elements/1.1/>
Regardless of the complexity of the query, the Query-O-Matic should be able to process the results. Best of all, you can then take the query and add it to your own code and know that it's been pretested.
However, if you're not a big fan of Java, then you may be interested in the PHP version of Query-O-Matic, Query-O-Matic Lite.
If you've worked with PHP and with XML, then you're familiar with the PHP XML classes. These classes provide functionality to process virtually all popular uses of XML, including RDF/XML. The two packages of interest in this chapter are RDQL and RDQL_DB.
|
As you can imagine from the package names, RDQL provides RDQL query capability within the PHP environment, and RDQL_DB provides persistent support for it. They're both so complete that the PHP version of Query-O-Matic (Lite) took less than 10 lines of code, hence the Lite designation. But before we look at that, let's take a close look at the classes themselves.
There are four classes within the RDQL package, but the one of interest to us is RDQL_query_document. This class has one method, rdql_query_url, which takes as a string a contained query string and returns an array of associative arrays with the results of the query. The RDQL_DB package provides two classes of particular importance to this chapter: RDQL_db, which controls all database actions, and RDQL_query_db, which acts the same as RDQL_query_document, taking a string and returning the results of a query as an array of results. RDQL_DB makes use of RDQL for query parsing and other shared functionality.
To use RDQL_DB, you'll need to preload the database structure required by the package. This is found in a file called rdql_db.sql in the installation. At this time, only MySQL is supported, and the file is loaded at the command line:
mysql databasename < rdql_db.sql
|
The RDQL table structure is quite simple. Two tables are created: rdf_data contains columns for each member of an RDF triple as well as information about each, and rdf_documents keeps track of the different RDF/XML documents that are loaded into the database. Unlike the PHP classes discussed in Chapter 9, the PHP RDQL and RDQL_DB packages provide functionality to parse, load, and persist existing RDF/XML documents and to use RDQL to query them, but neither provides functionality to modify or create an RDF/XML document.
At the time of this writing, the PHP XML classes had not been updated to include the new RDF/XML constructs. Because of this, the example RDF/XML document used for most of the book, monsters1.rdf, can't be parsed cleanly. Instead, another RDF/XML document was used. This document is reproduced in Example 10-8 so that you can follow the demonstration more easily.
<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bbd="http://burningbird.net/resume/elements/1.0/" xml:base="http://burningbird.net/shelley_powers/resume/" > <rdf:Description rdf:about="http://burningbird.net/shelley_powers/"> <bbd:bio rdf:resource="bio"/> <bbd:job rdf:resource="job" /> <bbd:education rdf:resource="education" /> <bbd:experience rdf:resource="experience" /> <bbd:skills rdf:resource="skills" /> <bbd:references rdf:resource="references" /> </rdf:Description> <rdf:Description rdf:about="bio"> <bbd:firstname>Shelley</bbd:firstname> <bbd:lastname>Powers</bbd:lastname> <bbd:city>St. Louis</bbd:city> <bbd:state>Missouri</bbd:state> <bbd:country>US</bbd:country> <bbd:homephone> - </bbd:homephone> <bbd:mobile> - </bbd:mobile> <bbd:workphone> - </bbd:workphone> <bbd:email>shelleyp@burningbird.net</bbd:email> </rdf:Description> <rdf:Description rdf:about="job"> <bbd:position>Software Engineer</bbd:position> <bbd:position>Technical Architect</bbd:position> <bbd:experience>16+ years</bbd:experience> <bbd:permorcontract>Contract</bbd:permorcontract> <bbd:start>2002-09-29</bbd:start> <bbd:relocate>No</bbd:relocate> <bbd:travel>yes</bbd:travel> <bbd:location>St. Louis, Missouri</bbd:location> <bbd:status>full</bbd:status> <bbd:rateusdollars>100</bbd:rateusdollars> <bbd:unit>hour</bbd:unit> <bbd:worklocation>both</bbd:worklocation> <bbd:idealjob>I'm primarily interested in contract positions with a fairly aggressive schedule; I like to be in an energetic environment. My preferred work is technology architecture, but I'm also a hands-on senior software developer. </bbd:idealjob> </rdf:Description> <rdf:Description rdf:about="education"> <rdf:_1> <rdf:Description rdf:about="degree1"> <bbd:degree>AA</bbd:degree> <bbd:discipline>Liberal Arts</bbd:discipline> <bbd:date>1981-06-01</bbd:date> <bbd:gpa>3.98</bbd:gpa> <bbd:honors>High Honors</bbd:honors> <bbd:college>Yakima Valley Community College</bbd:college> <bbd:location>Yakima, Washington</bbd:location> </rdf:Description> </rdf:_1> <rdf:_2> <rdf:Description rdf:about="degree2"> <bbd:degree>BA</bbd:degree> <bbd:discipline>Psychology</bbd:discipline> <bbd:date>1986-06-01</bbd:date> <bbd:gpa>3.65</bbd:gpa> <bbd:honors>Magna cum laude</bbd:honors> <bbd:honors>Dean's Scholar</bbd:honors> <bbd:college>Central Washington University</bbd:college> <bbd:location>Ellensburg, Washington</bbd:location> </rdf:Description> </rdf:_2> <rdf:_3> <rdf:Description rdf:about="degree3"> <bbd:degree>BS</bbd:degree> <bbd:discipline>Computer Science</bbd:discipline> <bbd:date>1987-06-01</bbd:date> <bbd:gpa>3.65</bbd:gpa> <bbd:college>Central Washington University</bbd:college> <bbd:location>Ellensburg, Washington</bbd:location> </rdf:Description> </rdf:_3> </rdf:Description> <rdf:Description rdf:about="experience"> <rdf:_1> <rdf:Description rdf:about="job1"> <bbd:company>Boeing</bbd:company> <bbd:title>Data Architect</bbd:title> <bbd:title>Information Repository Modeler</bbd:title> <bbd:title>Software Engineer</bbd:title> <bbd:title>Database Architect</bbd:title> <bbd:start>1987</bbd:start> <bbd:end>1992</bbd:end> <bbd:description> At Boeing I worked as a developer for the Peace Shield Project (FORTRAN/Ingres on VAX/ VMS). Peace Shield is Saudi Arabia's air defense system. At the end of the project, I moved into a position of Oracle DBA and provided support for various organizations. I worked with Oracle versions 5.0 and 6.0, and with SQL Forms, Pro*C, and OCI. I was also interim information modeler for Boeing Commercial's Repository, providing data modeling and design for this effort. From the data group, I moved into my last position at Boeing, which was for the Acoustical and Linguistics group, developing applications for Windows using Microsoft C, C++, the Windows SDK, and using Smalltalk as a prototype tool. The object-based applications we created utilized new speech technology as a solution to business needs including a speech driven robotic work order system. </bbd:description> </rdf:Description> </rdf:_1> </rdf:Description> <rdf:Description rdf:about="skills"> <rdf:_1> <rdf:Description rdf:about="java"> <bbd:level>Expert</bbd:level> <bbd:years>6</bbd:years> <bbd:lastused>now</bbd:lastused> </rdf:Description> </rdf:_1> <rdf:_2> <rdf:Description rdf:about="C++"> <bbd:level>Expert</bbd:level> <bbd:years>8</bbd:years> <bbd:lastused>2 years ago</bbd:lastused> </rdf:Description> </rdf:_2> </rdf:Description> </rdf:RDF>
|
To demonstrate both the persistence capability and the query functionality of the PHP XML classes, Example 10-9 shows a complete PHP page that opens a connection to the database, loads in a document, queries the data, and then removes the document from persistent storage.
<? mysql_connect("localhost","username","password"); mysql_select_db("databasename"); ?> <html> <head> <title>RDQL PHP Example</title> </head> <body> <?php include_once("C:\class_rdql_db\class_rdql_db.php"); # read in, store document $rdqldb = new RDQL_db( ); $rdqldb->set_warning_mode(true); $rdqldb->store_rdf_document("http://weblog.burningbird.net/resume.rdf","resume"); # build and execute query $query='SELECT ?b FROM <resume> WHERE (?a, <bbd:title>, ?b) USING bbd for <http://www.burningbird.net/resume_schema#>'; #parse and print results $rows = RDQL_query_db::rdql_query_db($query); if (!empty($rows)) { foreach($rows as $row) { foreach($row as $key=>$val) { print("$val<p>"); } } } else { print("No data found"); } # data dump and delete document from db $data = $rdqldb->get_rdf_document("resume"); print("<h3>General dump of the data</h3>"); print($data); $rdqldb->remove_rdf_document("resume"); ?> </div> </body> </html>
This example is running in a Windows environment, and the path to the PHP class is set accordingly. The method get_rdf_document returns the RDF/XML of the document contained within the database. To print out the elements as well as the data, modify the string before printing:
$data=str_replace("<","<",$data); $data=str_replace(">",">",$data); print ($data);
As the example demonstrates, parsing and querying an RDF/XML document with the PHP XML classes is quite simple, one of the advantages of a consistent metadata storage and query language.
The code for Query-O-Matic Lite is even simpler. The first page with the HTML form has just one field, querystr, a textarea input field. When the form is submitted, the second page accesses this string, strips out any slashes, and then passes the string directly to the PHP class to process the query, as is shown in Example 10-10. In this example, the RDQL class is used and the document is opened directly via URL, rather than being persisted to a database first. In addition, unlike Query-O-Matic, Lite allows multiple variables in the select clause?each is printed out with spaces in between, and each row is printed on a separate line.
<html> <head> <title>RDFQL Query-O-Matic Light</title> </head> <body> <?php include_once("class_rdql.php"); $querystr=stripslashes($_GET['querystr']); $rows = RDQL_query_document::rdql_query_url($querystr); if (empty($rows)) die("No data found for your query"); foreach($rows as $row) { foreach($row as $key=>$val) { print("$val "); } print ("<br /><br />"); } ?> </body> </html>
Even accounting for the HTML in the example, Query-O-Matic Lite is one of the smallest PHP applications I've created. However, as long as the underlying RDF/XML parser (class_rdf_parser) can parse the RDF/XML, you can run queries against the data.
Figure 10-3 shows the first page of Query-O-Matic Lite, with an RDQL query typed into the query input text box.
The query, shown in Example 10-11, accesses all degrees and disciplines within the document and prints them out.
SELECT ?degree, ?discipline FROM <http://weblog.burningbird.net/resume.rdf> WHERE (?a, <bbd:discipline>, ?discipline), (?a, <bbd:degree>, ?degree) USING bbd for <http://burningbird.net/resume/elements/1.0/>
The results of running this query are:
AA Liberal Arts BA Psychology BS Computer Science
The PHP XML classes also support conditional and Boolean operators for filtering data once a subset has been found with the triple patterns. It's just that the set of operators differs from those for Jena, as there has been no standardization of RDQL across implementations...yet. In addition, you can list more than one document in the from/source clause, and the data from both is then available for the query.
I loaded several RDF/RSS files (for more on RSS, see Chapter 13) from my web sites and then created a query that searched for all entries after a certain time (the start of 2003) and printed out the date/timestamp, title, and link to the article. Example 10-12 contains the RDQL for this query.
SELECT ?date, ?title, ?link FROM <http://weblog.burningbird.net/index.rdf> <http://articles.burningbird.net/index.rdf> <http://rdf.burningbird.net/index.rdf> WHERE (?a, <rdf:type>, <rss:item>), (?a, <rss:title>, ?title), (?a, <rss:link>, ?link), (?a, <dc:date>, ?date) AND ?date > '2002-12-31' USING rss for <http://purl.org/rss/1.0/>, dc for <http://purl.org/dc/elements/1.1/>
The data from all RDF/XML files was joined, the query made and filtered, and the resulting output met my expectations. Not only that, but the process was quite quick, as well as incredibly easy?a very effective demonstration of the power of RDF, RDF/XML, and RDQL.