implementation report: Sesame and the RDF specs from Jeen Broekstra on 2003-08-07 (www-rdf-interest@w3.org from August 2003)

From: Jeen Broekstra <jeen.broekstra@aidministrator.nl>
Date: 07 Aug 2003 14:51:54 +0200
To: www-rdf-comments@w3.org, www-rdf-interest@w3.org
Cc: sesame-interest@lists.sourceforge.net, Arjohn Kampman <Arjohn.Kampman@aidministrator.nl>
Message-Id: <1060260714.4377.33.camel@dogbert>
(also in response to Dan Brickley's query for implementation reports)

Sesame (http://sesame.aidministrator.nl/) is a Java-based architecture
for RDF and RDFS storage, querying and inferencing, available under the
LGPL license.

This is an implementation report intended to give an overview of which
parts of the various RDF specifications Sesame implements, how this is
achieved, and how this fulfills requirements in the various projects in
which Sesame is deployed. This overview is based on Sesame release
0.9.2.

The core of Sesame is the SAIL (Storage And Inference Layer) API, a set
of Java interfaces that abstracts from the storage format and offers
retrieval, inferencing and manipulation methods to Sesame's functional
modules. SAIL implementations are available for various relational
databases (MySQL, PostgreSQL, Oracle, SQL Server) and for in-memory
storage.

RDF parsing and writing
-----------------------

Sesame comes with its own RDF Parser package: RIO (Rdf I/O). It supports
parsing of RDF/XML and N-Triples and writing of RDF/XML, N-Triples and
N3. Its main design considerations are speed and low memory consumption.
The RDF/XML parser from the RIO package uses a SAX2-parser for parsing
the XML document. Any SAX2-compliant XML parser can be used in
combination with the RIO RDF/XML parser.

The RDF/XML parser has initially been developed using the Revised
RDF/XML Syntax Specification of 18 December 2001
(http://www.w3.org/TR/2001/WD-rdf-syntax-grammar-20011218/) and has been
updated with every new release of the specification. The grammar in this
document proved to be very useful for developing the RDF/XML parser,
especially because the production rules can quite easily be mapped to
SAX-events.

The available RDF parser test cases have been used extensively during
the development of the RDF/XML parser. These test cases provided
valuable feedback on the compliance of the parser to the specification.
Of the approved test cases as specified in the document of 23 January
2003, all but three test cases pass. These three test cases relate to
the Unicode encoding of URIs and literals. The RIO RDF/XML parser does
not yet verify that these are in Normal Form C.

RDF + RDFS Entailment
---------------------

In release 0.9.2 of Sesame, inferencing is only supported in the SAIL
implementation for relational databases. This SAIL implements the RDF
Semantics as formalized by the January 23 Working Draft
(http://www.w3.org/TR/2003/WD-rdf-mt-20030123/). This is achieved by an
exhaustive forward chaining inferencer that iterates over the entailment
rules and computes and stores the closure. For this, the entailment
rules have been translated to SQL queries that almost map one-to-one to
the entailment rules. This naive inferencing approach performs very
satisfactorily in various practical settings and scales well upto
O(10^6) statements.

Inferencing for in-memory repositories is currently under development,
following requirements from several projects (see later in this report).
Also, work is being done to create a configurable (forward chaining)
inferencer. This inferencer will be used to (partially) implement OWL
Lite semantics, but will also allow one to define arbitrary inferencing
rules.

Various projects make use of inferencing and querying support in Sesame.
Since most of these are run by third parties or are non-public, we can
reference only a few here:

The Cuypers Multimedia Transformation engine, developed by CWI (see
http://homepages.cwi.nl/~media/cuypers/), is a research prototype
system, developed to experiment with the automatic generation of
Web-based presentations as an interface to semi-structured multimedia
databases. It uses Sesame as a middleware layer to drive dynamic,
semantics-based generation of web interfaces.  

The SWAP (Semantic Web And P2P) EU-IST project
(http://swap.semanticweb.org/) makes use of Sesame as the primary
storage, querying and inferencing system for each peer node. 

KIM (http://www.ontotext.com/kim), developed by OntoText, is a software
platform for automatic ontology population and open-domain dynamic
semantic annotation of unstructured and semi-structured content for
Semantic Web and KM applications. Built on top of Sesame, it makes
extensive use of its inferencing capabilities.

RDF Datatyping
--------------

A relatively new feature of RDF Semantics is datatyping. Sesame
currently has limited support for datatyping: it supports the use of
datatypes but has no built-in knowledge of the semantics of XSD
datatypes (and therefore fails most of the last call RDF entailment test
cases that concern datatypes). 

Support for datatyping as specified in the last call working drafts is
on our todo-list, but has not been given priority, due to the fact that
we have had few user requests for it, and that Sesame offers a
(non-compliant) limited form of datatyping in the SeRQL and RQL query
languages.

RDF Querying
------------

The functional modules of Sesame include three query engines: RQL, RDQL
and SeRQL. SeRQL is developed as a hybrid language that aims to combine
the strongest features of existing languages. It draws on experiences
with RQL and RDQL as well as syntax specifications such as N3. One of
its strong features is the ability to do a limited form of graph
transformation, using a CONSTRUCT clause. 

The design of SeRQL has been based on the last call working drafts with
respect to support for several features. An example of this is the
addition of the 'language' and 'datatype' facets to literals, for which
SeRQL has functions to query them explicitly. Also, the RDF Semantics
have been very useful in determining the precise operation of comparison
operators, for example when two literals are equal or not. However,
SeRQL currently has no support for doing datatype-aware comparison.

Further developments of SeRQL will be based on requirements from the
projects in which we deploy Sesame as well as the results of ongoing
discussions on the rdf-rules mailinglist.

The last call Working Drafts
----------------------------

Overall, we have been very satisfied with the last call working drafts.
For the purposes of implementation, they thoroughly describe most, if
not all, aspects of RDF and RDF Schema.  The RDF/XML grammar, the parser
test cases and the entailment rules have been of much use during the
development of Sesame. One thing that, in our opinion, could still be
improved are the entailment test cases. Test cases covering more aspects
of RDF(S) inferencing would be helpful to ensure that different
inferencer implementations are compatible.

The post-last call changes
--------------------------

Due to the fact that the discussion about the post-last call changes
seemed to indicate that they are not stable yet, we have chosen thusfar
to not update Sesame to these changes. Instead, we have focused on
getting Sesame fully compliant with the last call working drafts, and on
implementing features on top of that.

Due to limited resources, we currently do not plan to implement any of
the post-last call changes until they become 'official' as a working
draft. Possible exceptions are obvious bugs in the last call WDs that
would directly affect practical use of Sesame.


-- 
jeen.broekstra@aidministrator.nl
aidministrator nederland bv - http://www.aidministrator.nl/ 
julianaplein 14b, 3817 cs amersfoort, the netherlands
tel. +31-(0)33-4659987, fax. +31-(0)33-4659987
Received on Thursday, 7 August 2003 08:54:06 UTC