implementing triple stores from Sandro Hawke on 2009-05-10 (public-rif-wg@w3.org from May 2009)

From: Sandro Hawke <sandro@w3.org>
Date: Sun, 10 May 2009 10:08:03 -0400
To: Bijan Parsia <bparsia@cs.manchester.ac.uk>
cc: RIF WG <public-rif-wg@w3.org>
Message-ID: <1663.1241964483@ubehebe>

> On 9 May 2009, at 13:40, Sandro Hawke wrote:
> [snip]
> > Right.  The interesting/challening part, I believe, is that Gary wants
> > to translate frames to Java objects.  Doing so will require some
> > cleverness, since there are significant semantics/functionality
> > differences, but hopefully will give significant performance gains.
> > Ternary predicates are typically not super fast at matching (a,b,?).
> 
> Really?

I don't understand.  You seem to be disagreeing with my claim, but then,
below, you seem to be support it.  I'm trying to claim that a naive
system which just uses an ordinary rdf/3 predicate is not likely to be
able to list all the object for a given subject+predicate very quickly
(compared to a less-naive implementation.)

> > (For example, SWI-Prolog uses a specially indexed structure for RDF
> > triples/quads, because normal predicate indexing is too slow.)
> 
> I wouldn't think that's the issue. Predicate indexing is typically one  
> of the more optimized part of a Prolog engine (for obvious reasons). A  
> uniform ternary predicate *defeats* predicate indexing because there's  
> only one predicate. By default, IIRC and it hasn't changed, most  
> Prolog engines do predicate plus first argument indexing, though you  
> can change that, e.g.,:
> 	http://www.lix.polytechnique.fr/~catuscia/teaching/prolog/Manual/sec-3.
> 11.html#sec 
> :3.11.1
> 	http://www.lix.polytechnique.fr/~catuscia/teaching/prolog/Manual/sec-3.
> 12.html#index/ 

I can't find a web page describing it, but from what I recall, the SWI
Prolog RDF store (which is really a quad store) has about 8 hash
indexes, convering different combinations of subject, predicate, object,
and graph.  So you get hash-table performance asking for all the objects
for given values of subject, predicate, graph; and you also get
hash-table performance asking for all the predicates and graphs for a
given subject and object combination, etc.  (The first example there
seems like the most common case; the second example among the least
common.)  Obviously there are penalties in space and in insertion time.
(As I vaguely recall, there's a reasonable amortized cost on deletion,
done in a garbage collection style.)

I tend to think having a hash table per object/frame gives good enough
performance, but it is cool to be able to do various other queries
quickly as well.

     -- Sandro

Received on Sunday, 10 May 2009 14:08:12 UTC