- From: Steve Harris <S.W.Harris@ecs.soton.ac.uk>
- Date: Thu, 14 Aug 2003 17:32:25 +0100
- To: www-rdf-comments@w3.org, www-rdf-interest@w3.org
3store RDF(S) server implementation report - 2003-08-13 Note, this report refers to the CVS version of 3store, the current release (2.2.7) has less support. Implementation 3store is a set of tools, written in C, that use the C library librdfsql to support RDF operations. The graph is represented in a set of tables in a MySQL DBMS. It should be portable to POSIX compliant systems. RDF Import RDF/XML and RDF/NTriples parsing is handled by the raptor library, although raptor produces datatype and language information and 3store produces some entailments from them it does not retain the information in its internal graph, so datatype and language information is not present in exports or query results. This is likely to be fixed in future versions. The largest schema that has been asserted is the NCI Cancer Ontology (http://www.mindswap.org/2003/CancerOntology/), which contains around 27,000 classes. Assertion of this ontology was slow (taking around 20 minutes) and subsequent changes to schema data were also slow, however queries and pure RDF assertions remained fast. Retraction of individual triples is currently not supported in the API and implementing it efficiently would be hard, due to the way the entailments are generated. We do not intend to address this issue. Typical import speeds on current x86 servers are on the order of 1000 triples / second, average over a 7 million triple import. The back-end is known to handle queries over 20 million triples efficiently, and is expected to scale to larger sizes. The 20 million triple dataset (around 1.6GB of RDF/XML) takes up 2.8GB on disk, 1GB of data and 1.8GB or indexes, when serialised in SQL tables. This could be improved at the cost of query speed if storage requirements were a concern. Developing RDF stores to hold this volume of data was found to be reasonably straightforward, using conventional database techniques. Local APIs In addition to the C API, the librdfsql library has a Python API (beta), and there are Perl and PHP libraries that provide query APIs through the 3store tools. Native Perl and PHP APIs are planned. Remote interfaces Queries can be issued over HTTP, supported by an Apache web server module, a SOAP interface is in development. The 3store tools can also connect directly the remote servers, permissions allowing. Developing the software to provide these services is straightforward, but it is difficult to return query results in a convenient manner over HTTP in a way which is efficient for large result sets. Entailments 3store supports all the non-datatype RDF and RDFS entailments, and a selection of the datatype related entailments. Details can be found in the test-case results: http://triplestore.aktors.org/rdf-test/ The tests will be run manually before each future release, and periodically as regression tests during development. The entailments are generated with a mixture of C and SQL algorithms. Aspects of the base entailment rules have been combined in some cases, for efficiency. There are no particular issues related to making the entailments in the specifications. Query support 3store supports a dialect of the OKBC query language and an extended subset of the RDQL query language. However the OKBC interface is likely to be deprecated in the near future, as it gets little use. Queries are resolved using a graph to SQL translator, the SQL schema is designed to allow efficient querying, the technique used is described in this paper: http://eprints.ecs.soton.ac.uk/archive/00007970/ Implementation of this query translator has taken a significant amount of development time, and there is still work outstanding to support the rest of the RDQL language. One ongoing problem is accuratly detecting complex queries before executing them; to prevent large numbers of concurrent queries tying up the server. The nature of RDF makes query execution time difficult to predict, and so optimal scheduling and distribution of queries is an outstanding problem. RDF Export RDF/XML export is supported, and NTriples export will be developed in the future. The design of the SQL schema requires two passes of the dataset in order to export RDF/XML, which is inefficient for large subgraphs. A previous design allowed for single-pass export but made assertion and quering less efficient. Deployment 3store has been used in a number of environments including as a central knowledge base and semantic web server for the AKT Project (http://www.aktors.org/), as a back-end for the CS AKTive Space (http://triplestore.aktors.org/demo/) and as a repository for the hyphen.info RDF collection (http://hyphen.info/). The server software appears to be stable and remains running for months at a time while sustaining high query loads (up to 60 queries per second, average over one month). There do not appear to be any issues with the RDF specification in this regard. It has been successfully deployed by a number of academic projects and commercial sites. Other sites are mostly using it as a small scale storage technology (typically a few hundred thousand triples), mostly using it because of the fast query execution.
Received on Thursday, 14 August 2003 12:32:27 UTC