10.3 Inkling and SquishQL

Unlike rdfDB, written in C in a Linux environment, the Inkling database was written in Java, originally on Linux and Solaris and most recently hosted and tested on Mac OS X, using several Java JDBC classes. Though I've tried it only on the Mac OS X environment myself, it should work in other environments that have Java installed. An additional requirement for Inkling is an installation of PostgreSQL, as it uses this database for persistent storage (unlike rdfDB, which manages its own storage).

You can view documentation and test the Inkling database online at http://swordfish.rdfweb.org/rdfquery/. You can also download source code for Inkling at this site. Note that Inkling uses PostgreSQL for its persistent data store. If you don't want to install Inkling to your own system, you can also use the online test application, running it against your own persisted RDF/XML documents available on the Web.

Once you've downloaded the Inkling installation file, you'll first need to make sure that you have a database called test created, and that you've run the SQL commands contained in the inklingsqlschema.psql file. You'll also need to set JAVA_HOME. In the Mac OS X environment, JAVA_HOME is set to /Library/Java/Home if you're using the Java installations that are designed specifically for Mac OS X.

The data structure loaded into the PostgreSQL database is relatively simple?one table containing pointers (hashed values) to the actual values in a second table. A flag specifies if the value is a resource or an actual object. If I have anything to disagree with about this design, it's the combination of resources and objects in one table. Resource URIs are typically Unicode character strings most likely not more than a few hundred characters or so in length. Objects (literals), though, can be large. My test file used in many of the other examples in this book (http://burningbird.net/articles/monsters1.rdf ) has objects that can be several thousand characters in length. Normally, a better design would have been to separate out the known resources into a separate table or even two tables?one for predicates, one for subjects. However, that's a personal preference.

You can access several demonstration applications installed with Inkling or the online application. You can also use a set of Java classes that support the application directly. Of particular interest in these is a JDBC driver created specifically for Inkling-formatted data, allowing you to query data using a SquishQL-formatted query whether the data is in PostgreSQL database. However, we're more interested at this point in the queries, which we'll focus on in the rest of this section.

The example file used throughout this chapter is from Example 6-6 ? monsters1.rdf.

The SquishQL supported in Inkling has strong ties to SQL. A simple query is similar to the following:

SELECT ?subject
FROM http://burningbird.net/articles/monsters1.rdf
WHERE (dc::subject ?x ?subject)
USING dc FOR http://purl.org/dc/elements/1.1/

In this query, triples form a where clause, leading with the predicate, followed by subject and then by object. If the query uses a variable as placeholder, all values in that field are returned. For this example, all dc:subject predicates are returned regardless of specific subject or object value.

The query is being made against a file rather than the default database (and can be accessed remotely via a URL), which is noted in the FROM clause. The SELECT clause lists the value or values returned, and the USING clause gives a mapping between the predicate URI and the abbreviation for the URI. It's important to note that the using clause isn't a namespace prefix, but a way of providing abbreviations for longer URIs. This could mean a specific namespace but isn't limited only to namespaces formally identified within the RDF/XML document.

The variables begin with a question mark and consist of characters, with no spaces. Figure 10-1 shows both this query and the output format as given in the Inkling online query application.

Figure 10-1. Preparing to run a query against the test RDF document
figs/prdf_1001.gif

After submitting the form, a second page opens up displaying the results:

The subject is Loch Ness Monster 
The subject is giant squid 
The subject is legends 
The subject is Architeuthis Dux 
The subject is Nessie

You can also make more complex queries. For instance, to find all uses of pstcn:reason associated with movements, rather than with related resources, you can join query triples to return specific predicates for given resources that are themselves identified by other predicates; in this case, a predicate of rdf:type of http://burningbird.net/postcon/elements/1.0/Movement, as shown in Example 10-1.

Example 10-1. Finding all reasons for movements within test RDF/XML document
SELECT ?resource ?value
FROM http://burningbird.net/articles/monsters1.rdf
WHERE (rdf::type ?resource "http://burningbird.net/postcon/elements/1.0/Movement")
      (pstcn::reason ?resource ?value)
USING pstcn FOR http://burningbird.net/postcon/elements/1.0/
      rdf FOR http://www.w3.org/1999/02/22-rdf-syntax-ns#

In this example, the first triple looks for all resources with a given rdf:type of http://burningbird.net/postcon/elements/1.0/Movement. These are then passed into the second triple in the subject field, fine-tuning the reasons returned to those associated with movement resources. In the example, predicates from two namespaces are used, as shown in the using clause. In addition, two values are returned in the select clause and printed out:

The reason for the movement to http://www.dynamicearth.com/articles/monsters1.htm is 
Moved to separate dynamicearth.com domain 
The reason for the movement to http:/burningbird.net/articles/monsters1.htm is 
Collapsed into Burningbird 
The reason for the movement to http://www.yasd.com/dynaearth/monsters1.htm is New 
Article

This combining of triple patterns is known as following one specific path within an RDF model, of node-arc-node-arc-node and so on. You can add additional triple patterns to travel further down the path until you reach the data you're after, no matter how deeply nested within the model. The key is to use a variable assigned data in one triple pattern?such as a subject or object value?as one of the constraints in the next triple pattern and so on.

In addition to filtering based on triple pattern matching, you can also use more traditional query constraints such as the less-than (<) and greater-than (>) operators and equality (= and ~). All of the comparison operators work with integers except for the string equality operator (~).

In Example 10-2, the string equality operator is used to return a resource from a movement on a specific date.

Example 10-2. Find movement resource where movement occurred on a specific date
SELECT ?resource 
FROM http://burningbird.net/articles/monsters1.rdf
WHERE (rdf::type ?resource "http://burningbird.net/postcon/elements/1.0/Movement")
      (dc::date ?resource ?date)
AND ?date ~ "1999-10-31:T00:00:00-05:00"
USING pstcn FOR http://burningbird.net/postcon/elements/1.0/
      rdf FOR http://www.w3.org/1999/02/22-rdf-syntax-ns#
      dc FOR http://purl.org/dc/elements/1.1/

The example just shown is a variation of about the most complex query you'll see with RDF, regardless of specific query language. Variations of the queries just add additional constraints, namespaces, sources (such as multiple documents), and so on. But the basic structure given in the following remains the same:

SELECT variables
FROM source
WHERE (triple clause)
USING namespace mapping

The type of query language demonstrated, beginning with rdfDB and continuing with SquishQL, is the one that's formed the basis of one of the more popular RDF/XML query languages, RDQL, demonstrated in the next section.