- From: Graham Klyne <GK-lists@ninebynine.org>
- Date: Sun, 06 Sep 2009 11:40:46 +0100
- To: Semantic Web <semantic-web@w3.org>
- CC: "Seaborne, Andy" <andy.seaborne@hp.com>
Seaborne, Andy wrote: >> I would expect a more RDF-centric way where I can define indexes on >> subsets of triples, e.g. grouped by properties, etc. Would this be >> possible to implement for, let's say, Jena/TDB? > > Yes - it's possible to implement. It's something I have wanted to do for a while now. This is something I've been thinking about recently. I think it's do-able reasonably easily - i.e. with very modest enhancements to existing code. <background> By way of background, I've been using SPARQLite (http://code.google.com/p/sparqlite/ - which is based on Andy's ARQ and LARQ packages) to support a search and browse application over data from 4 diverse data sources. The triple store is based on TDB and comes in at somewhere around 10 million triples. We have managed to avoid writing any new code specific to our project for the runtime system: this is important to us for sustainability reasons. We are, in part, replicating functionality that is already provided for one of the data sources in isolation using a relational database, so the performance bottlenecks of using a triple store compared with RDB are quite starkly exposed. Many queries work really well, but others, mainly involving some kind of ordering, are not providing the performance we need. </background> Based on study of our running system, I am confident that a modest enhancement to the LARQ component could deliver performance for our application. Currently, as provided, LARQ supports a single index linked to the 'pf:textMatch' property in SPARQL queries. All the machinery for linking properties in SPARQL queries is present in ARQ/LARQ, but needs new application code to exploit. Not provided is a facility to configure multiple such properties and link them to different Lucene indexes. That development is one that I'd like to use to boost our performance, and one that I think is relatively easy to implement. This approach doesn't automatically speed up arbitrary queries, but I think it will allow us to design queries that perform well to extract any required information for the end-user application. I'd count that as a very big win for modest effort. SO: has anyone done anything like this, which can be used out-of-the box? (I did notice the message about Parliamemnt, which seems to do something like this, but I'm concerned about the overhead and learning curve of switching to a different back-end. And it's not immediately obvious to me if it supports free-text queries, which we use extensively. A solution based on ARQ/LARQ would be favourite for me.) #g
Received on Sunday, 6 September 2009 10:42:47 UTC