There would seem to be a natural fit between Perl, a language known for its parsing and pattern-matching capability, and RDF. When I went searching for Perl APIs, I was surprised to find that several I discovered had not been updated for months (sometimes years) or were seriously incomplete. However, I was able to find a couple of Perl APIs that are active, are being supported, and provide much of the functionality necessary for working with the RDF data model through RDF/XML.
|
I found the Ginger Alliance Perl APIs by searching for RDF within CPAN, the repository of Perl on the Internet (accessible at http://perl.com). The organization provides Perl modules that can parse, store, and query Notation3 (RDF::Notation3) as well as RDF/XML (RDF::Core), but we'll cover only the RDF/XML module in this book.
Updates to PerlRDFBe aware that at the time of this writing, PerlRDF has not been updated to reflect all of the new constructs released with the newest RDF Working Group documents. However, I was assured by the author that PerlRDF is still being fully supported, and the group had every intention of ensuring it meets the new RDF specifications as soon as they release. The version used in this chapter was released in October 2002. You can download the Ginger Alliance Perl modules from CPAN or access them directly at the organization's web site (at http://www.gingerall.cz/charlie/ga/xml/p_rdf.xml). The examples in this chapter were created with RDF::Core 2.0, and installation instructions are contained within the source files. Both APIs work with Perl 5 and should be platform independent. The source is licensed under Mozilla Public License 1.1 and the GNU General Public License. |
The RDF::Core modules for RDF/XML allow you to parse and store an existing RDF/XML document, add to it, and query it using function calls, as well as serialize a new or modified model. You can store the module in memory, within a PostgreSQL database or in Berkeley DB.
RDF models can be built within the code or parsed in from an external file. First, though, you have to create a storage mechanism to store the data. PerlRDF gives you a choice of storing a model in memory or in a Berkeley DB or PostgreSQL database. The RDF::Core::Storage object manages the memory access, and it has three different implementations for the three different storage mechanisms.
RDF::Core::Storage::Memory manages in-memory storage. This object won't persist after the Perl application terminates or goes out of scope, and the only unique method is new, which takes no parameters:
require RDF::Core::Storage::Memory; my $storage = new RDF::Core::Storage::Memory;
The RDF::Core Berkeley DB object, RDF::Core::Storage::DB_File, utilizes the existing Berkeley Database DB_File Perl module for much of its functionality. DB_File uses the tie function to bind the DB object functions to the database file on disk, hiding much of the detail of database management. Unlike the memory method, the DB_File object's new method takes several parameters:
The name used as the first part of the name for several files, to support the structures necessary to store the RDF model.
Equivalent to the flags and mode used with the Berkeley DB dbopen method. Examples of flags are O_RDONLY, O_RDRW, and O_CREAT. By default, O_RDONLY and O_RDRW are used. The default mode is 0666.
Controls the number of statements returned within an enumerator (to be discussed) if nonzero.
Number of wire transfer processes to complete before synchronizing memory data with storage or zero to not force synchronization.
In the following code, a storage object is instantiated, set to the current directory with the name of rdfdata, and given a MemLimit set to 500 statements; all other values are set to default:
require RDF::Core::Storage::DB_File; my $storage = new RDF::Core::Storage::DB_File(Name =>'./rdfdata', MemLimit => 5000, );
The last storage mechanism supported in RDF::Core, RDF::Core::Storage::PostGres uses the PostgreSQL data store to persist the RDF model. Its new method takes the following options:
PostgreSQL connection string
Database user and password
Distinguish between models (can store than one model in PostgreSQL database)
After a storage object is instantiated, the methods to manipulate its data are the same regardless of the underlying physical storage mechanism.
A basic procedure is used with PerlRDF to create a new RDF model. First, create the storage mechanism; next, create the model and each of the components of an RDF statement, assigning them to a new statement. Finally, add the statement to the model. That's all you need to add a new triple to an RDF model. The power of this Perl module is in its simplicity of use.
To demonstrate this, Example 9-1 shows a simple application that creates a new model using a Berkeley database, adds a couple of statements for the same resource, and then serializes the model to RDF/XML. The first statement adds an rdf:type of PostCon Resource to the main resource; the second adds a movement type predicate. Note that predicate objects are created directly from the subject object, though the two aren't associated within the model until they're added to the model. Also note that literals are specific instances of Perl objects, in this case RDF::Core::Literal.
use strict; require RDF::Core::Storage::Memory; require RDF::Core::Model; require RDF::Core::Statement; require RDF::Core::Model::Serializer; require RDF::Core::Literal; # create storage object my $storage = new RDF::Core::Storage::Memory; my $model = new RDF::Core::Model (Storage => $storage); my $subject = new RDF::Core::Resource('http://burningbird.net/articles/monsters1.rdf'); my $predicate = $subject->new('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'); my $object = new RDF::Core::Resource('http://burningbird.net/postcon/elements/1.0/Resource'); my $statement = new RDF::Core::Statement($subject, $predicate, $object); $model->addStmt($statement); $model->addStmt(new RDF::Core::Statement($subject, $subject->new('http://burningbird.net/postcon/elements/1.0/movementType'), new RDF::Core::Literal('Move'))); my $xml = ''; my $serializer = new RDF::Core::Model::Serializer(Model=>$model, Output=>\$xml ); $serializer->serialize; print "$xml\n";
Running this application results in the following RDF/XML:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:a="http://burningbird.net/postcon/elements/1.0/" > <rdf:Description about="http://burningbird.net/articles/monsters1.rdf"> <rdf:type rdf:resource="http://burningbird.net/postcon/elements/1.0/Resource"/> <a:movementType>Move</a:movementType> </rdf:Description> </rdf:RDF>
PerlRDF hasn't been updated to reflect the W3C's recommendation to qualify all attributes; in this case about should become rdf:about. However, this isn't an error, and the most that happens when testing this in the RDF Validator is that you'll get a warning:
Warning: {W101} Unqualified use of rdf:about has been deprecated.[Line = 5, Column = 72]
|
Additional statements can be built and added on using the same approach. If the statement can be modeled as a particular N-Triple, it can be added as a statement to the model using RDF::Core, including blank nodes.
In Example 9-2, the code will add the N-Triples statements equivalent to the newer RDF construct, rdf:value. From monsters1.rdf, this looks like the following using the more formalized syntax:
<pstcn:presentation rdf:parseType="Resource"> <pstcn:requires rdf:parseType="Resource"> <pstcn:type>stylesheet</pstcn:type> <rdf:value>http://burningbird.net/de.css</rdf:value> </pstcn:requires> </pstcn:presentation>
Technically, no specific method is included in RDF::Core for creating the formalized rdf:value syntax, but one's not needed as long as you can add statements for each N-Triple that results when the syntax is broken down into triples. In the case of rdf:value, the N-Triples for the rdf:value construct associated with the stylesheet in monsters1.rdf are (from the RDF Validator):
_:jARP24590 <http://burningbird.net/postcon/elements/1.0/type> "stylesheet" . _:jARP24590 <http://www.w3.org/1999/02/22-rdf-syntax-ns#value> "http://burningbird.net/de.css" . _:jARP24589 <http://burningbird.net/postcon/elements/1.0/requires> _:jARP24590 . <http://burningbird.net/articles/monsters1.htm> <http://burningbird.net/postcon/elements/1.0/presentation> _:jARP24589 .
Breaking this down into actions, first a blank node must be created and added as a statement with the resource monsters1.htm and a given predicate of http://burningbird.net/postcon/elements/1.0/presentation. This blank node is then used as the resource for the next statement that's added, which adds another blank node, this one with the predicate of http://burningbird.net/postcon/elements/1.0/requires. In this example, the RDF::Core object NodeFactory creates the blank nodes for both.
Next, the second blank node that was created is used to add the next statement, with a predicate of http://www.w3.org/1999/02/22-rdf-syntax-ns#value and value of http://burningbird.net/de.css. The last statement has a predicate of http://burningbird.net/postcon/elements/1.0/type and a value of stylesheet. Since blank nodes created by NodeFactory are RDF::Core::Resource objects, they can also create predicates for each of the statements.
use strict; require RDF::Core::Storage::Memory; require RDF::Core::Model; require RDF::Core::Statement; require RDF::Core::Model::Serializer; require RDF::Core::Literal; require RDF::Core::NodeFactory; # create storage object my $storage = new RDF::Core::Storage::Memory; my $model = new RDF::Core::Model (Storage => $storage); # new subject and new resource factory my $subject = new RDF::Core::Resource('http://burningbird.net/articles/monsters1.rdf'); my $factory = new RDF::Core::NodeFactory(BaseURI=>'http://burningbird.net/articles/'); # create bnode for presentation my $bPresentation = $factory->newResource; # create bnode for requires my $bRequires = $factory->newResource; # add presentation my $predicate = $subject->new('http://burningbird.net/postcon/elements/1.0/presentation'); my $statement = new RDF::Core::Statement($subject, $predicate, $bPresentation); $model->addStmt($statement); # add requires $model->addStmt(new RDF::Core::Statement($bPresentation, $bPresentation->new('http://burningbird.net/postcon/elements/1.0/requires'), $bRequires)); # add rdf:value $model->addStmt(new RDF::Core::Statement($bRequires, $bRequires->new('http://www.w3.org/1999/02/22-rdf-syntax-ns#value'), new RDF::Core::Literal('http://burningbird.net/de.css'))); # add value type $model->addStmt(new RDF::Core::Statement($bRequires, $bRequires->new('http://burningbird.net/postcon/elements/1.0/type'), new RDF::Core::Literal('stylesheet'))); my $xml = ''; my $serializer = new RDF::Core::Model::Serializer(Model=>$model, Output=>\$xml ); $serializer->serialize; print "$xml\n";
Running the application results in the following RDF/XML output:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:a="http://burningbird.net/postcon/elements/1.0/" > <rdf:Description rdf:about="http://burningbird.net/articles/monsters1.htm"> <a:presentation> <rdf:Description> <a:requires> <rdf:Description> <rdf:value>http://burningbird.net/de.css</rdf:value> <a:type>stylesheet</a:type> </rdf:Description> </a:requires> </rdf:Description> </a:presentation> </rdf:Description> </rdf:RDF>
Plugging this into the RDF Validator and asking for N-Triples output returns the following N-Triples:
_:jARP24933 <http://www.w3.org/1999/02/22-rdf-syntax-ns#value> "http://burningbird.net/de.css" . _:jARP24933 <http://burningbird.net/postcon/elements/1.0/type> "stylesheet" . _:jARP24932 <http://burningbird.net/postcon/elements/1.0/requires> _:jARP24933 . <http://burningbird.net/articles/monsters1.htm> <http://burningbird.net/postcon/elements/1.0/presentation> _:jARP24932 .
This maps back to the original N-Triples that we used to build the statements in the first place. As the generated N-Triples demonstrate, the subgraph of the monsters1.rdf directed graph that's specific to the use of rdf:value is identical to using the more formalized syntax for this construct. Regardless of the complexity of the model, the same procedure can be used to add all statements.
In addition to building a model from scratch, you can also read RDF models in from external resources such as an RDF/XML document, demonstrated in the next section.
Using RDF::Core to parse and query an RDF file is much simpler than creating an RDF model within code, something true of all APIs and parsers used in this book.
Whether you build the RDF model directly in the code or read it in, you still have to create a storage object and attach it to a model before you can start adding statements. However, when you read in a model from an external source, you can use the RDF::Core::Model::Parser object to read in the RDF/XML directly and generate the appropriate statements.
|
To demonstrate how simple it is to read in an RDF/XML document, the code in Example 9-3 reads in the monsters1.rdf file, storing it in a Berkeley DB datastore. The application then calls getStmts on the model, returning an RDF::Core::Enumerator object, which is used to print out the N-Triples defined within the document.
use strict; require RDF::Core::Model; require RDF::Core::Model::Parser; require RDF::Core::Enumerator; require RDF::Core::Statement; require RDF::Core::Storage::DB_File; # create storage my $storage = new RDF::Core::Storage::DB_File(Name =>'./rdfdata', MemLimit => 5000, ); # create model and map to storage my $model = new RDF::Core::Model (Storage => $storage); # define parser options and parse external RDF/XML document my %options = ( Model => $model, Source => "/home/shelleyp/www/articles/monsters1.rdf", SourceType => 'file', BaseURI => "http://burningbird.net/", InlineURI => "http://burningbird.net/" ); my $parser = new RDF::Core::Model::Parser(%options); $parser->parse; # enumerate through statements, printing out labels my $enumerator = $model->getStmts; my $statement = $enumerator->getFirst; while (defined $statement) { print $statement->getLabel."\n"; $statement = $enumerator->getNext } # close enumerator $enumerator->close;
The Berkeley DB file prefix is rdfdata, and several files will be generated with this prefix. The options for the parser include the file location for the RDF/XML document, the fact that it's being read in as a file and not a URL, and a base and an inline URI. The base URI is used to resolve relative URIs, while the inline URI is for blank node resources. RDF::Core generates a blank node identifier consisting of this inline URI and a separate number for each blank node within the document.
When the application is run, the N-Triples are printed out to system output, which can then be piped to a file to persist the output. A sampling of these N-Triples representing the subgraph we've been using for the example, the rdf:value syntax, is:
<http://burningbird.net/articles/monsters1.htm> <http://burningbird.net/postcon/ elements/1.0/presentation> <http://burningbird.net/3> . <http://burningbird.net/3> <http://burningbird.net/postcon/elements/1.0/requires> <http://burningbird.net/4> . <http://burningbird.net/4> <http://burningbird.net/postcon/elements/1.0/type> "stylesheet". <http://burningbird.net/4> <http://www.w3.org/1999/02/22-rdf-syntax-ns#value> "http:/ /burningbird.net/de.css" .
Though the blank node identifiers are different from those generated by the RDF Validator, the statements are equivalent.
Now that the RDF/XML document has been read in, we can access it within the database to perform more selective queries.
In the last section, the code read the RDF/XML into a persistent Berkeley Database. Instead of going back against the document, we'll use the database for queries in the next examples.
You might want to see how many statements have a given predicate. To count statements matching a specific value in any one of the triple components, use countStmts, passing in appropriate search parameters for subject, predicate, and object. The number of statements found matching the given values is returned. Passing an undefined parameter signals that any value found for the specific items is a match. In Example 9-4, we're interested in only the statements that use the predicate http://burningbird.net/postcon/elements/1.0/reason. The code loads the database and accesses the countStmts directly on the RDF::Core::Storage object (the Model object has a countStmts function, too).
use strict; require RDF::Core::Storage::DB_File; require RDF::Core::Resource; # load model from storage my $storage = new RDF::Core::Storage::DB_File(Name =>'./rdfdata', MemLimit => 500); # objects must be defined my $subject; my $object; # initiate predicate my $predicate = new RDF::Core::Resource('http://burningbird.net/postcon/elements/1.0/reason'); # get count of statements for predicate and print my $val = $storage->countStmts($subject, $predicate, $object); print $val . "\n";
When run, the application returns a value of 6, matching the number of statements that have the given predicate. If you're interested only in statements with a given predicate and subject, you could define the subject object in addition to the predicate:
my $subject = new RDF::Core::Resource("http://burningbird.net/articles/monsters4.htm");
The value then returned is 1, for one record found matching that combination of subject and predicate.
|
If you're interested in finding data to go with the count of statements, you can modify the code to use the method getStmts instead, returning an enumerator, which you can then traverse to get the data you're interested in.
The RDF::Core classes also support a more sophisticated querying capability similar to RDQL (discussed in detail in the next chapter). As with RDQL, the query language supported with RDF::Core supports select, from, and where keywords for the results, source, and search parameters. Three objects process RDQL queries in RDF::Core:
A mapping between a row of data and a function handler
An evaluator object passed to the query to be used to evaluate the specific query
A query object
The RDF::Core::Functions class contains a package of functions used to drill down to specific schema elements within the query set. It's instantiated first, taking optional instances of the model being queried, an instance of the RDF Schema model, and a factory object.
The RDF::Core::Evaluator class is what evaluates a specific query, passed in as a string, formed from RDQL. When it's instantiated, it can take an instance of the model being queried, the instance of the Functions class, as well as the factory class and a hash containing namespaces and their associated prefixes, or it can default for a default namespace. The last option is a reference to a function defined in the code to be called for each row returned in the query set. If this isn't provided, then the result is returned as an array of rows.
The RDF::Core::Query class pulls the other objects and the query string together, returning an array of rows (statements) matching the query or passing the row to a function defined within the function object to process each row. The documentation included with RDF::Core::Query provides a description of the query language supported with RDF::Core including examples.
|