3store RDF(S) server implementation report from Steve Harris on 2003-08-14 (www-rdf-interest@w3.org from August 2003)

From: Steve Harris <S.W.Harris@ecs.soton.ac.uk>
Date: Thu, 14 Aug 2003 17:32:25 +0100
To: www-rdf-comments@w3.org, www-rdf-interest@w3.org
Message-ID: <20030814163225.GQ4141@ecs.soton.ac.uk>
3store RDF(S) server implementation report - 2003-08-13

Note, this report refers to the CVS version of 3store, the current release
(2.2.7) has less support.

Implementation

3store is a set of tools, written in C, that use the C library librdfsql
to support RDF operations. The graph is represented in a set of tables
in a MySQL DBMS. It should be portable to POSIX compliant systems.

RDF Import

RDF/XML and RDF/NTriples parsing is handled by the raptor library,
although raptor produces datatype and language information and 3store
produces some entailments from them it does not retain the information in
its internal graph, so datatype and language information is not present in
exports or query results. This is likely to be fixed in future versions.

The largest schema that has been asserted is the NCI Cancer Ontology
(http://www.mindswap.org/2003/CancerOntology/), which contains around
27,000 classes. Assertion of this ontology was slow (taking around 20
minutes) and subsequent changes to schema data were also slow, however
queries and pure RDF assertions remained fast.

Retraction of individual triples is currently not supported in the
API and implementing it efficiently would be hard, due to the way the
entailments are generated. We do not intend to address this issue.

Typical import speeds on current x86 servers are on the order of 1000
triples / second, average over a 7 million triple import. The back-end
is known to handle queries over 20 million triples efficiently, and is
expected to scale to larger sizes.

The 20 million triple dataset (around 1.6GB of RDF/XML) takes up
2.8GB on disk, 1GB of data and 1.8GB or indexes, when serialised in SQL
tables. This could be improved at the cost of query speed if storage
requirements were a concern.

Developing RDF stores to hold this volume of data was found to be
reasonably straightforward, using conventional database techniques.

Local APIs

In addition to the C API, the librdfsql library has a Python API (beta),
and there are Perl and PHP libraries that provide query APIs through
the 3store tools. Native Perl and PHP APIs are planned.

Remote interfaces

Queries can be issued over HTTP, supported by an Apache web server module,
a SOAP interface is in development. The 3store tools can also connect
directly the remote servers, permissions allowing.

Developing the software to provide these services is straightforward,
but it is difficult to return query results in a convenient manner over
HTTP in a way which is efficient for large result sets.

Entailments

3store supports all the non-datatype RDF and RDFS entailments, and a
selection of the datatype related entailments. Details can be found in
the test-case results: http://triplestore.aktors.org/rdf-test/

The tests will be run manually before each future release, and
periodically as regression tests during development.

The entailments are generated with a mixture of C and SQL algorithms.
Aspects of the base entailment rules have been combined in some cases,
for efficiency. There are no particular issues related to making the
entailments in the specifications.

Query support

3store supports a dialect of the OKBC query language and an extended
subset of the RDQL query language. However the OKBC interface is likely
to be deprecated in the near future, as it gets little use.

Queries are resolved using a graph to SQL translator, the SQL schema is
designed to allow efficient querying, the technique used is described
in this paper: http://eprints.ecs.soton.ac.uk/archive/00007970/

Implementation of this query translator has taken a significant amount
of development time, and there is still work outstanding to support the
rest of the RDQL language.

One ongoing problem is accuratly detecting complex queries before
executing them; to prevent large numbers of concurrent queries tying
up the server. The nature of RDF makes query execution time difficult
to predict, and so optimal scheduling and distribution of queries is
an outstanding problem.

RDF Export

RDF/XML export is supported, and NTriples export will be developed
in the future. The design of the SQL schema requires two passes of
the dataset in order to export RDF/XML, which is inefficient for large
subgraphs. A previous design allowed for single-pass export but made
assertion and quering less efficient.

Deployment

3store has been used in a number of environments including as a
central knowledge base and semantic web server for the AKT Project
(http://www.aktors.org/), as a back-end for the CS AKTive Space
(http://triplestore.aktors.org/demo/) and as a repository for the
hyphen.info RDF collection (http://hyphen.info/).

The server software appears to be stable and remains running for months
at a time while sustaining high query loads (up to 60 queries per second,
average over one month). There do not appear to be any issues with the
RDF specification in this regard.

It has been successfully deployed by a number of academic projects
and commercial sites. Other sites are mostly using it as a small scale
storage technology (typically a few hundred thousand triples), mostly
using it because of the fast query execution.
Received on Thursday, 14 August 2003 12:32:27 UTC