Re: implementing triple stores from Bijan Parsia on 2009-05-10 (public-rif-wg@w3.org from May 2009)

From: Bijan Parsia <bparsia@cs.manchester.ac.uk>
Date: Sun, 10 May 2009 17:03:14 +0100
To: Sandro Hawke <sandro@w3.org>
Cc: RIF WG <public-rif-wg@w3.org>
Message-Id: <E66E60A4-0987-4671-8302-3CB6B29E0FD5@cs.manchester.ac.uk>

On 10 May 2009, at 15:08, Sandro Hawke wrote:

>> On 9 May 2009, at 13:40, Sandro Hawke wrote:
>> [snip]
>>> Right.  The interesting/challening part, I believe, is that Gary  
>>> wants
>>> to translate frames to Java objects.  Doing so will require some
>>> cleverness, since there are significant semantics/functionality
>>> differences, but hopefully will give significant performance gains.
>>> Ternary predicates are typically not super fast at matching (a,b,?).
>>
>> Really?
>
> I don't understand.  You seem to be disagreeing with my claim,

I was.

> but then,
> below, you seem to be support it.

My understanding of what I subsequently wrote is that I was supporting  
my disagreement.

>  I'm trying to claim that a naive
> system which just uses an ordinary rdf/3 predicate is not likely to be
> able to list all the object for a given subject+predicate very quickly
> (compared to a less-naive implementation.)

But not due to a fundamental property of the system, which you seemed  
to be implying. Using "isIndex" *is* using the built in predicate  
indexing system.

>>> (For example, SWI-Prolog uses a specially indexed structure for RDF
>>> triples/quads, because normal predicate indexing is too slow.)
>>
>> I wouldn't think that's the issue. Predicate indexing is typically  
>> one
>> of the more optimized part of a Prolog engine (for obvious  
>> reasons). A
>> uniform ternary predicate *defeats* predicate indexing because  
>> there's
>> only one predicate. By default, IIRC and it hasn't changed, most
>> Prolog engines do predicate plus first argument indexing, though you
>> can change that, e.g.,:
>> 	http://www.lix.polytechnique.fr/~catuscia/teaching/prolog/Manual/sec-3 
>> .
>> 11.html#sec
>> :3.11.1
>> 	http://www.lix.polytechnique.fr/~catuscia/teaching/prolog/Manual/sec-3 
>> .
>> 12.html#index/
>
> I can't find a web page describing it, but from what I recall, the SWI
> Prolog RDF store (which is really a quad store) has about 8 hash
> indexes, convering different combinations of subject, predicate,  
> object,
> and graph.

Sure. But you might want to have more hashing with a p(s, o)  
representation too (e.g., if you have lots of queries based on  
objects). I don't see that the reified representation makes a  
fundamental difference.

[snip]
> I tend to think having a hash table per object/frame gives good enough
> performance, but it is cool to be able to do various other queries
> quickly as well.

My point was that p(s,o) vs. t(s,p,o) in a typical Prolog system is  
pretty much adding an isIndex call for the former. Thus, "naive  
performance"  isn't a reason (in those systems) for preferring one to  
the other. Performance of various queries *is* a reason for preferring  
another representation altogether.

Cheers,
Bijan.

Received on Sunday, 10 May 2009 16:04:01 UTC