10.4 RDQL

The RDQL language is based on the earlier work of Guha's RDFDB QL and SquishQL, with some relatively minor differences. Its popularity is ensured because of its use within Jena, probably the most widely used RDF API.

RDQL supports the different clauses of select, from, where, and using (with some exceptions) as SquishQL. Additionally, RDQL can change based on the implementation and whether you're using a Java API such as Jena, a PHP class such as the PHP XML classes, or a Perl module such as RDFStore. However, though the syntax varies within the clauses, the concepts remain the same.

Variables are in the format of a question mark, followed by other characters, just as in SquishQL:

?<identifier>

However, one difference between SquishQL and RDQL occurs in the select clause, which requires commas rather than spaces to separate all variables.

The from, or source, clause, can be omitted with RDQL depending on the implementation. For instance, in Jena, the source of the RDF/XML can be specified and loaded separately through a separate class method or can be given directly in the query. However, in the PHP RDF/XML classes, the from clause must be provided within the query. The same applies to RDFStore, which also requires that the URL be surrounded by angle brackets.

The where clause (or triple pattern clause) differs in that the pattern follows the more traditional subject-predicate-object ordering, and URIs are differentiated from literals by being surrounded by angle brackets. However, the way that triple patterns are combined to form more complex queries is the same in RDQL and SquishQL.

RDQL has greater sophistication in incorporating comparison semantics with the triple pattern within the constrain clause. The use of AND is the same, but other operators ? such as the OR operator (|||), bitwise operators (& and |), and negation (!) ? are supported.

Within Jena, there is no using clause because the namespaces for the resources are included with the resource rather than being listed as a separate namespace. However, the PHP XML classes support using, as does RDFStore.

10.4.1 Jena's RDQL and the Query-O-Matic

In addition to the rich set of Java classes that allow access to individual triples as well as the ability to build complex RDF/XML documents (as described in Chapter 8) Jena also provides specialized classes for use with RDQL:

Query

The Query class manages the actual query, enabling the building of a query through an API or passed as a string.

QueryExecution

Query engine interface.

QueryEngine

The actual execution of the query (the intelligence behind the query process).

QueryResults

The iterator that manages the results.

ResultBinding

Mapping from variables to values.

In addition to these standard classes, newer implementations of Jena also support some newer classes, such as a QueryEngineSesame class, which works against the Sesame RDF repository (discussed at the end of the chapter).

The use of the classes is very straightforward. Use Query to build or parse the query, which is then passed to QueryEngine for processing. The results are returned to the QueryExecution class, which provides methods to access the results, which are assigned to QueryResults. To access individual items in the results, the data is bound to program variables using ResultsBinding.

To demonstrate how Jena works with RDQL, I created a dynamic query application, which I call the Query-O-Matic, building it in Java as a Tomcat JSP application.

10.4.1.1 The Query-O-Matic

The Query-O-Matic is a two-page application, with the first HTML page containing a form and the second JSP page processing the form contents. It's built using Jena 1.6, and managed with Tomcat. The source code is included as part of the example code for the book.

The Query-O-Matic does require that you have knowledge of Tomcat and JSP-based applications. If you don't, you can still work with the code, but you'll need to provide a different interface for it. You can get more details about Jena's RDQL support in the RDQL tutorial at http://www.hpl.hp.com/semweb/doc/tutorial/RDQL/index.html.

To create the application, the Jena .jar files must be copied to the common library or to the application-specific WEB-INF lib directory. I copied them to the common library location because I use Jena for several applications.

The first page is nothing special, an HTML form with three fields:

  • The first field is a text input field to hold the URL of the RDF/XML document.

  • The second field is a textarea to hold the actual query.

  • The third field is another text input file to hold the variable that's printed out.

Figure 10-2 shows the page containing the form, as well as links to sample RDF/XML documents.

Figure 10-2. Form to capture RDQL parameters
figs/prdf_1002.gif

In the JSP page, the form values are pulled from the HTTP request. The URL is used to load the document; once it is loaded, the query is run against the document using the Jena QueryEngine class. To iterate through the results, another class, QueryResults, is created, and each record returned from the query is then bound to a specific object, in order to access a specific value. The result value that's passed from the form is polled from the object and the value is printed out, as shown in Example 10-3. Once all values are processed, the result set is closed.

Example 10-3. Java/JSP code to dynamically process RDQL query using Jena
<html>
<%@ page import="com.hp.hpl.mesa.rdf.jena.mem.*,
                 java.io.File,
                 java.util.*,
                 com.hp.hpl.mesa.rdf.jena.model.*,
                 com.hp.hpl.mesa.rdf.jena.common.*,
                 com.hp.hpl.jena.util.*,
                 com.hp.hpl.jena.rdf.query.*,
                 com.hp.hpl.jena.rdf.query.parser.*" %>

<body>

<%
   ModelMem model;

   try {
   model = new ModelMem(  );
   String sUri = request.getParameter("uri");
   String sQuery = request.getParameter("query");
   String sResult = request.getParameter("result");
 
   model.read(sUri);

   // query string
   Query query = new Query(sQuery);

   query.setSource(model);

   QueryExecution qe = new QueryEngine(query) ; 
   QueryResults results = qe.exec(  );
   out.print("<h1>test</h1>"); 

	for ( Iterator iter2 = results ; iter2.hasNext(  ) ; ) { 
		ResultBinding env = (ResultBinding)iter2.next(  ) ; 
                Object obj = env.get(sResult);
                out.print(obj.toString(  ));
                out.print("<br>"); 
	} 

   // close results
   results.close(  ) ;
   }
   catch (Exception e) {
     out.print(e.toString(  ));
   }

   
%>
<br>
</font>
</body>
</html>

Once the two pages and supporting Jena .jar files are installed into Tomcat, we're ready to try out some RDQL in the Query-O-Matic.

10.4.1.2 Trying out the Query-O-Matic

The simplest test of the Query-O-Matic is to run an RDQL variation of the first query made with Inkling/SquishQL, which is to find all the dc:subject predicates in the RDF/XML document and print out the associated object values. The contents of the form are given in Example 10-4.

Example 10-4. RDQL query to find dc:subject in RDF/XML document
uri: http://burningbird.net/articles/monsters1.rdf
query: SELECT ?subject
            WHERE (?x, <dc:subject>, ?subject)
            USING dc FOR <http://purl.org/dc/elements/1.1/>
result: subject

Comparing this with the SquishQL example shows that both are basically the same with minor syntactic differences. When the form is submitted and the query processed, the results returned are exactly the same, too.

Another slightly more complicated query is shown in Example 10-5, which demonstrates traversing two arcs in order to find a specific value.

Example 10-5. More complex query traversing two arcs
SELECT ?value
WHERE (?resource, <rdf:type>, <pstcn:Movement>),
(?resource, <pstcn:reason>, ?value)
USING pstcn FOR<http://burningbird.net/postcon/elements/1.0/>,
      rdf FOR <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

Notice that object values that are resources are treated the same as the subject and predicate values, with angle brackets around the URI (or the QName). The only type of value that doesn't have angle brackets is literals.

A slightly more complicated query more fully demonstrates the filtering capability of the triple pattern. To better understand how this query works, take a look at the N-Triples of the statements of the subgraph from the monsters1.rdf example:

<http://burningbird.net/articles/monsters1.htm> <http://www.w3.org/1999/02/22-rdf-
syntax-ns#type> <http://burningbird.net/postcon/elements/1.0/Resource> .
<http://burningbird.net/articles/monsters1.htm> <http://burningbird.net/postcon/
elements/1.0/presentation> _:jARP10030 .
_:jARP10030 <http://burningbird.net/postcon/elements/1.0/requires> _:jARP10032 .
_:jARP10032 <http://burningbird.net/postcon/elements/1.0/type> "logo" .
_:jARP10032 <http://www.w3.org/1999/02/22-rdf-syntax-ns#value> "http://burningbird.
net/mm/dynamicearth.jpg" .
_:jARP10030 <http://burningbird.net/postcon/elements/1.0/requires> _:jARP10031 .
_:jARP10031 <http://burningbird.net/postcon/elements/1.0/type> "stylesheet" .
_:jARP10031 <http://www.w3.org/1999/02/22-rdf-syntax-ns#value> "http://burningbird.
net/de.css" .

These are the statements we'll be querying with the code shown in Example 10-6. Within the query, the pstcn:presentation arc is followed from the main resource (monsters1.htm) to get the object/resource for it (a blank node). Then, the pstcn:requires predicate arc is followed to get the two required presentation bnodes. However, we're interested only in the one whose pstcn:type is "stylesheet". Once we have that, then we'll access the value of the stylesheet. The path I just highlighted in the text is also highlighted in the example.

Example 10-6. Using triple pattern as a filter
SELECT ?value
WHERE (?x, <pstcn:presentation>, ?resource),
(?resource, <pstcn:requires>, ?resource2),
(?resource2, <pstcn:type>, "stylesheet"),
(?resource2, <rdf:value>, ?value)
USING pstcn FOR       <http://burningbird.net/postcon/elements/1.0/>,
      rdf FOR <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

The result from running this query is:

http://burningbird.net/de.css

Exactly what we wanted to get.

I used a triple pattern to find the specific required presentation resource, rather than a conditional filter, because I wasn't going to be querying among the end values?I'm actually modifying the query within the path to the end statement. If I wanted to find specific values using a conditional filter, I would list triple patterns up until I returned all of the statements of interest and then use the filter on these statements to find specific values.

A demonstration of this is shown in Example 10-7, where a date is returned for a movement with movement type of "Add". Notice that equality is denoted by the eq operator rather than using nonalphabetic characters such as ==, common in several programming languages.

Example 10-7. Returning date for movement of type "Add"
SELECT ?date
WHERE 
(?resource, <rdf:type>, <pstcn:Movement>),
(?resource, <pstcn:movementType>, ?value),
(?resource, <dc:date>, ?date)
AND (?value eq "Add")
USING pstcn FOR       <http://burningbird.net/postcon/elements/1.0/>,
      rdf FOR <http://www.w3.org/1999/02/22-rdf-syntax-ns#>,
      dc for <http://purl.org/dc/elements/1.1/>

Regardless of the complexity of the query, the Query-O-Matic should be able to process the results. Best of all, you can then take the query and add it to your own code and know that it's been pretested.

However, if you're not a big fan of Java, then you may be interested in the PHP version of Query-O-Matic, Query-O-Matic Lite.

10.4.2 PHP Query-O-Matic Lite

If you've worked with PHP and with XML, then you're familiar with the PHP XML classes. These classes provide functionality to process virtually all popular uses of XML, including RDF/XML. The two packages of interest in this chapter are RDQL and RDQL_DB.

The PHP XML cla ss main web page is at http://phpxmlclasses.sourceforge.net/. This section assumes you are familiar with working with PHP.

As you can imagine from the package names, RDQL provides RDQL query capability within the PHP environment, and RDQL_DB provides persistent support for it. They're both so complete that the PHP version of Query-O-Matic (Lite) took less than 10 lines of code, hence the Lite designation. But before we look at that, let's take a close look at the classes themselves.

There are four classes within the RDQL package, but the one of interest to us is RDQL_query_document. This class has one method, rdql_query_url, which takes as a string a contained query string and returns an array of associative arrays with the results of the query. The RDQL_DB package provides two classes of particular importance to this chapter: RDQL_db, which controls all database actions, and RDQL_query_db, which acts the same as RDQL_query_document, taking a string and returning the results of a query as an array of results. RDQL_DB makes use of RDQL for query parsing and other shared functionality.

To use RDQL_DB, you'll need to preload the database structure required by the package. This is found in a file called rdql_db.sql in the installation. At this time, only MySQL is supported, and the file is loaded at the command line:

mysql databasename < rdql_db.sql

You must, of course, have the ability to modify the database in order to create tables in it. Follow the MySQL documentation if you have problems loading the RDQL tables.

The RDQL table structure is quite simple. Two tables are created: rdf_data contains columns for each member of an RDF triple as well as information about each, and rdf_documents keeps track of the different RDF/XML documents that are loaded into the database. Unlike the PHP classes discussed in Chapter 9, the PHP RDQL and RDQL_DB packages provide functionality to parse, load, and persist existing RDF/XML documents and to use RDQL to query them, but neither provides functionality to modify or create an RDF/XML document.

At the time of this writing, the PHP XML classes had not been updated to include the new RDF/XML constructs. Because of this, the example RDF/XML document used for most of the book, monsters1.rdf, can't be parsed cleanly. Instead, another RDF/XML document was used. This document is reproduced in Example 10-8 so that you can follow the demonstration more easily.

Example 10-8. Resume RDF/XML document
<?xml version="1.0"?>
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:bbd="http://burningbird.net/resume/elements/1.0/"
  xml:base="http://burningbird.net/shelley_powers/resume/" >

  <rdf:Description rdf:about="http://burningbird.net/shelley_powers/">
     <bbd:bio rdf:resource="bio"/>
     <bbd:job rdf:resource="job" />
     <bbd:education rdf:resource="education" />
     <bbd:experience rdf:resource="experience" />
     <bbd:skills rdf:resource="skills" />
     <bbd:references rdf:resource="references" />
     
  </rdf:Description>

  <rdf:Description rdf:about="bio">

     <bbd:firstname>Shelley</bbd:firstname>
     <bbd:lastname>Powers</bbd:lastname>
     <bbd:city>St. Louis</bbd:city>
     <bbd:state>Missouri</bbd:state>
     <bbd:country>US</bbd:country>
     <bbd:homephone> - </bbd:homephone>
     <bbd:mobile> - </bbd:mobile>
     <bbd:workphone> - </bbd:workphone>
     <bbd:email>shelleyp@burningbird.net</bbd:email>
   </rdf:Description>

  <rdf:Description rdf:about="job">
     <bbd:position>Software Engineer</bbd:position>
     <bbd:position>Technical Architect</bbd:position>
     <bbd:experience>16+ years</bbd:experience>
     <bbd:permorcontract>Contract</bbd:permorcontract>
     <bbd:start>2002-09-29</bbd:start>
     <bbd:relocate>No</bbd:relocate>
     <bbd:travel>yes</bbd:travel>
     <bbd:location>St. Louis, Missouri</bbd:location>
     <bbd:status>full</bbd:status>
     <bbd:rateusdollars>100</bbd:rateusdollars>
     <bbd:unit>hour</bbd:unit>
     <bbd:worklocation>both</bbd:worklocation>
     <bbd:idealjob>I'm primarily interested in contract positions with a 
                   fairly aggressive schedule; I like to be in an energetic 
                   environment. My preferred work is technology architecture, 
                   but I'm also a hands-on senior software developer.
     </bbd:idealjob>
     
   </rdf:Description>

  <rdf:Description rdf:about="education">
      <rdf:_1>
        <rdf:Description rdf:about="degree1">
          <bbd:degree>AA</bbd:degree>
          <bbd:discipline>Liberal Arts</bbd:discipline>
          <bbd:date>1981-06-01</bbd:date>
          <bbd:gpa>3.98</bbd:gpa>
          <bbd:honors>High Honors</bbd:honors>
          <bbd:college>Yakima Valley Community College</bbd:college>
          <bbd:location>Yakima, Washington</bbd:location>
        </rdf:Description>
      </rdf:_1>
      <rdf:_2>
        <rdf:Description rdf:about="degree2">
          <bbd:degree>BA</bbd:degree>
          <bbd:discipline>Psychology</bbd:discipline>
          <bbd:date>1986-06-01</bbd:date>
          <bbd:gpa>3.65</bbd:gpa>
          <bbd:honors>Magna cum laude</bbd:honors>
          <bbd:honors>Dean's Scholar</bbd:honors>
          <bbd:college>Central Washington University</bbd:college>
          <bbd:location>Ellensburg, Washington</bbd:location>
        </rdf:Description>
      </rdf:_2>
      <rdf:_3>
        <rdf:Description rdf:about="degree3">
          <bbd:degree>BS</bbd:degree>
          <bbd:discipline>Computer Science</bbd:discipline>
          <bbd:date>1987-06-01</bbd:date>
          <bbd:gpa>3.65</bbd:gpa>
          <bbd:college>Central Washington University</bbd:college>
          <bbd:location>Ellensburg, Washington</bbd:location>
        </rdf:Description>
      </rdf:_3>
  </rdf:Description>


  <rdf:Description rdf:about="experience">
     <rdf:_1>
        <rdf:Description rdf:about="job1">
           <bbd:company>Boeing</bbd:company>
           <bbd:title>Data Architect</bbd:title>
           <bbd:title>Information Repository Modeler</bbd:title>
           <bbd:title>Software Engineer</bbd:title>
           <bbd:title>Database Architect</bbd:title>
           <bbd:start>1987</bbd:start>
           <bbd:end>1992</bbd:end>
           <bbd:description>
At Boeing I worked as a developer for the Peace Shield Project (FORTRAN/Ingres on VAX/
VMS).  Peace Shield is Saudi Arabia's air defense system. At the end of the project, I 
moved into a position of Oracle DBA and provided support for various organizations.  I 
worked with Oracle versions 5.0 and 6.0, and with SQL Forms, Pro*C, and OCI. I was also 
interim information modeler for Boeing Commercial's Repository, providing data modeling 
and design for this effort.
From the data group, I moved into my last position at Boeing, which was for the Acoustical
and Linguistics group, developing applications for Windows using Microsoft C, C++, the 
Windows SDK, and using Smalltalk as a prototype tool. The object-based applications we 
created utilized new speech technology as a solution to business needs including a speech 
driven robotic work order system.
           </bbd:description>
        </rdf:Description>
     </rdf:_1>
  </rdf:Description>

  <rdf:Description rdf:about="skills">
    <rdf:_1>
      <rdf:Description rdf:about="java">
       <bbd:level>Expert</bbd:level>
       <bbd:years>6</bbd:years>
       <bbd:lastused>now</bbd:lastused>
      </rdf:Description>
    </rdf:_1>
    <rdf:_2>
      <rdf:Description rdf:about="C++">
       <bbd:level>Expert</bbd:level>
       <bbd:years>8</bbd:years>
       <bbd:lastused>2 years ago</bbd:lastused>
      </rdf:Description>
    </rdf:_2>
  </rdf:Description>

</rdf:RDF>

The PHP XML classes may have been updated to reflect the most recent RDF specifications by the time this book is published.

To demonstrate both the persistence capability and the query functionality of the PHP XML classes, Example 10-9 shows a complete PHP page that opens a connection to the database, loads in a document, queries the data, and then removes the document from persistent storage.

Example 10-9. Application to read in resume RDF/XML document and run query against it
<?
mysql_connect("localhost","username","password");
mysql_select_db("databasename");
?>
<html>
<head>
  <title>RDQL PHP Example</title>
</head>
<body>
<?php
include_once("C:\class_rdql_db\class_rdql_db.php");

# read in, store document
$rdqldb = new RDQL_db(  );
$rdqldb->set_warning_mode(true);
$rdqldb->store_rdf_document("http://weblog.burningbird.net/resume.rdf","resume");
# build and execute query
$query='SELECT ?b
FROM <resume>
WHERE (?a, <bbd:title>, ?b)
USING bbd for <http://www.burningbird.net/resume_schema#>';

#parse and print results
$rows = RDQL_query_db::rdql_query_db($query);
if (!empty($rows)) {
   foreach($rows as $row) {
      foreach($row as $key=>$val) {
         print("$val<p>");
      }
   }
}
else {
   print("No data found");
}

# data dump and delete document from db
$data = $rdqldb->get_rdf_document("resume");
print("<h3>General dump of the data</h3>");
print($data);

$rdqldb->remove_rdf_document("resume");
?>
</div>
</body>
</html>

This example is running in a Windows environment, and the path to the PHP class is set accordingly. The method get_rdf_document returns the RDF/XML of the document contained within the database. To print out the elements as well as the data, modify the string before printing:

$data=str_replace("<","&lt;",$data);
$data=str_replace(">","&gt;",$data);
print ($data);

As the example demonstrates, parsing and querying an RDF/XML document with the PHP XML classes is quite simple, one of the advantages of a consistent metadata storage and query language.

The code for Query-O-Matic Lite is even simpler. The first page with the HTML form has just one field, querystr, a textarea input field. When the form is submitted, the second page accesses this string, strips out any slashes, and then passes the string directly to the PHP class to process the query, as is shown in Example 10-10. In this example, the RDQL class is used and the document is opened directly via URL, rather than being persisted to a database first. In addition, unlike Query-O-Matic, Lite allows multiple variables in the select clause?each is printed out with spaces in between, and each row is printed on a separate line.

Example 10-10. Code for PHP RDF/XML Query-O-Matic Lite
<html>
<head>
  <title>RDFQL Query-O-Matic Light</title>
</head>
<body>
<?php

include_once("class_rdql.php");
$querystr=stripslashes($_GET['querystr']);
$rows = RDQL_query_document::rdql_query_url($querystr);
if (empty($rows)) die("No data found for your query");

foreach($rows as $row) {
      foreach($row as $key=>$val) {
        print("$val ");
      }
  print ("<br /><br />");
  }
?>
</body>
</html>

Even accounting for the HTML in the example, Query-O-Matic Lite is one of the smallest PHP applications I've created. However, as long as the underlying RDF/XML parser (class_rdf_parser) can parse the RDF/XML, you can run queries against the data.

Figure 10-3 shows the first page of Query-O-Matic Lite, with an RDQL query typed into the query input text box.

Figure 10-3. Entering an RDQL query into the Query-O-Matic
figs/prdf_1003.gif

The query, shown in Example 10-11, accesses all degrees and disciplines within the document and prints them out.

Example 10-11. RDQL query accessing disciplines and degrees from resume RDF/XML document
SELECT ?degree, ?discipline
FROM <http://weblog.burningbird.net/resume.rdf>
WHERE (?a, <bbd:discipline>, ?discipline),
      (?a, <bbd:degree>, ?degree)
USING bbd for <http://burningbird.net/resume/elements/1.0/>

The results of running this query are:

AA Liberal Arts 
BA Psychology 
BS Computer Science

The PHP XML classes also support conditional and Boolean operators for filtering data once a subset has been found with the triple patterns. It's just that the set of operators differs from those for Jena, as there has been no standardization of RDQL across implementations...yet. In addition, you can list more than one document in the from/source clause, and the data from both is then available for the query.

I loaded several RDF/RSS files (for more on RSS, see Chapter 13) from my web sites and then created a query that searched for all entries after a certain time (the start of 2003) and printed out the date/timestamp, title, and link to the article. Example 10-12 contains the RDQL for this query.

Example 10-12. Complex RDQL query
SELECT ?date, ?title, ?link
FROM <http://weblog.burningbird.net/index.rdf>
     <http://articles.burningbird.net/index.rdf>
     <http://rdf.burningbird.net/index.rdf>
WHERE (?a, <rdf:type>, <rss:item>),
      (?a, <rss:title>, ?title),
      (?a, <rss:link>, ?link),
      (?a, <dc:date>, ?date)
AND ?date > '2002-12-31'
USING rss for <http://purl.org/rss/1.0/>,
      dc for <http://purl.org/dc/elements/1.1/>

The data from all RDF/XML files was joined, the query made and filtered, and the resulting output met my expectations. Not only that, but the process was quite quick, as well as incredibly easy?a very effective demonstration of the power of RDF, RDF/XML, and RDQL.