- From: Jeen Broekstra <jeen.broekstra@aidministrator.nl>
- Date: 07 Aug 2003 14:51:54 +0200
- To: www-rdf-comments@w3.org, www-rdf-interest@w3.org
- Cc: sesame-interest@lists.sourceforge.net, Arjohn Kampman <Arjohn.Kampman@aidministrator.nl>
(also in response to Dan Brickley's query for implementation reports) Sesame (http://sesame.aidministrator.nl/) is a Java-based architecture for RDF and RDFS storage, querying and inferencing, available under the LGPL license. This is an implementation report intended to give an overview of which parts of the various RDF specifications Sesame implements, how this is achieved, and how this fulfills requirements in the various projects in which Sesame is deployed. This overview is based on Sesame release 0.9.2. The core of Sesame is the SAIL (Storage And Inference Layer) API, a set of Java interfaces that abstracts from the storage format and offers retrieval, inferencing and manipulation methods to Sesame's functional modules. SAIL implementations are available for various relational databases (MySQL, PostgreSQL, Oracle, SQL Server) and for in-memory storage. RDF parsing and writing ----------------------- Sesame comes with its own RDF Parser package: RIO (Rdf I/O). It supports parsing of RDF/XML and N-Triples and writing of RDF/XML, N-Triples and N3. Its main design considerations are speed and low memory consumption. The RDF/XML parser from the RIO package uses a SAX2-parser for parsing the XML document. Any SAX2-compliant XML parser can be used in combination with the RIO RDF/XML parser. The RDF/XML parser has initially been developed using the Revised RDF/XML Syntax Specification of 18 December 2001 (http://www.w3.org/TR/2001/WD-rdf-syntax-grammar-20011218/) and has been updated with every new release of the specification. The grammar in this document proved to be very useful for developing the RDF/XML parser, especially because the production rules can quite easily be mapped to SAX-events. The available RDF parser test cases have been used extensively during the development of the RDF/XML parser. These test cases provided valuable feedback on the compliance of the parser to the specification. Of the approved test cases as specified in the document of 23 January 2003, all but three test cases pass. These three test cases relate to the Unicode encoding of URIs and literals. The RIO RDF/XML parser does not yet verify that these are in Normal Form C. RDF + RDFS Entailment --------------------- In release 0.9.2 of Sesame, inferencing is only supported in the SAIL implementation for relational databases. This SAIL implements the RDF Semantics as formalized by the January 23 Working Draft (http://www.w3.org/TR/2003/WD-rdf-mt-20030123/). This is achieved by an exhaustive forward chaining inferencer that iterates over the entailment rules and computes and stores the closure. For this, the entailment rules have been translated to SQL queries that almost map one-to-one to the entailment rules. This naive inferencing approach performs very satisfactorily in various practical settings and scales well upto O(10^6) statements. Inferencing for in-memory repositories is currently under development, following requirements from several projects (see later in this report). Also, work is being done to create a configurable (forward chaining) inferencer. This inferencer will be used to (partially) implement OWL Lite semantics, but will also allow one to define arbitrary inferencing rules. Various projects make use of inferencing and querying support in Sesame. Since most of these are run by third parties or are non-public, we can reference only a few here: The Cuypers Multimedia Transformation engine, developed by CWI (see http://homepages.cwi.nl/~media/cuypers/), is a research prototype system, developed to experiment with the automatic generation of Web-based presentations as an interface to semi-structured multimedia databases. It uses Sesame as a middleware layer to drive dynamic, semantics-based generation of web interfaces. The SWAP (Semantic Web And P2P) EU-IST project (http://swap.semanticweb.org/) makes use of Sesame as the primary storage, querying and inferencing system for each peer node. KIM (http://www.ontotext.com/kim), developed by OntoText, is a software platform for automatic ontology population and open-domain dynamic semantic annotation of unstructured and semi-structured content for Semantic Web and KM applications. Built on top of Sesame, it makes extensive use of its inferencing capabilities. RDF Datatyping -------------- A relatively new feature of RDF Semantics is datatyping. Sesame currently has limited support for datatyping: it supports the use of datatypes but has no built-in knowledge of the semantics of XSD datatypes (and therefore fails most of the last call RDF entailment test cases that concern datatypes). Support for datatyping as specified in the last call working drafts is on our todo-list, but has not been given priority, due to the fact that we have had few user requests for it, and that Sesame offers a (non-compliant) limited form of datatyping in the SeRQL and RQL query languages. RDF Querying ------------ The functional modules of Sesame include three query engines: RQL, RDQL and SeRQL. SeRQL is developed as a hybrid language that aims to combine the strongest features of existing languages. It draws on experiences with RQL and RDQL as well as syntax specifications such as N3. One of its strong features is the ability to do a limited form of graph transformation, using a CONSTRUCT clause. The design of SeRQL has been based on the last call working drafts with respect to support for several features. An example of this is the addition of the 'language' and 'datatype' facets to literals, for which SeRQL has functions to query them explicitly. Also, the RDF Semantics have been very useful in determining the precise operation of comparison operators, for example when two literals are equal or not. However, SeRQL currently has no support for doing datatype-aware comparison. Further developments of SeRQL will be based on requirements from the projects in which we deploy Sesame as well as the results of ongoing discussions on the rdf-rules mailinglist. The last call Working Drafts ---------------------------- Overall, we have been very satisfied with the last call working drafts. For the purposes of implementation, they thoroughly describe most, if not all, aspects of RDF and RDF Schema. The RDF/XML grammar, the parser test cases and the entailment rules have been of much use during the development of Sesame. One thing that, in our opinion, could still be improved are the entailment test cases. Test cases covering more aspects of RDF(S) inferencing would be helpful to ensure that different inferencer implementations are compatible. The post-last call changes -------------------------- Due to the fact that the discussion about the post-last call changes seemed to indicate that they are not stable yet, we have chosen thusfar to not update Sesame to these changes. Instead, we have focused on getting Sesame fully compliant with the last call working drafts, and on implementing features on top of that. Due to limited resources, we currently do not plan to implement any of the post-last call changes until they become 'official' as a working draft. Possible exceptions are obvious bugs in the last call WDs that would directly affect practical use of Sesame. -- jeen.broekstra@aidministrator.nl aidministrator nederland bv - http://www.aidministrator.nl/ julianaplein 14b, 3817 cs amersfoort, the netherlands tel. +31-(0)33-4659987, fax. +31-(0)33-4659987
Received on Thursday, 7 August 2003 08:54:06 UTC