RE: NOT implementing triple stores from Paul Vincent on 2009-05-11 (public-rif-wg@w3.org from May 2009)

From: Paul Vincent <pvincent@tibco.com>
Date: Mon, 11 May 2009 03:30:57 -0700
To: "Bijan Parsia" <bparsia@cs.manchester.ac.uk>, "Sandro Hawke" <sandro@w3.org>
Cc: "RIF WG" <public-rif-wg@w3.org>
Message-ID: <A92210407BA7004199621BE5F0AC5D8BEC0429@NA-PA-VBE04.na.tibco.com>
Gentlemen: I don't think anyone doubts that performance of Prolog
systems can be "quite good"... :)

I believe the "misquote" mentioned should in fact be the understatement:
"Gary wants to translate frames to Java objects".

Please interpret this as "the general software industry (as in users of
Java and .NET) will need to translate RIF Frames to the Java/.NET
objects used in their software systems". The motivation for this is
presumably the unwritten end-of-sentance "...for RIF to succeed outside
of the semantic web and academic community". 

I suggest that the above is (likelihood> 99% IMHO) true regardless of
Prolog compiler performance...

Apologies if you *really did mean* to discuss triple store performance
issues for RIF etc, although maybe that subject is premature for the RIF
WG mailing list?

---

On frames-objects: "... there are significant semantics/functionality
differences ..." probably deserves some attention.

Cheers
Paul Vincent 
+1 650 206 2493 / mobile +44 781 493 7229 

> -----Original Message-----
> From: public-rif-wg-request@w3.org
[mailto:public-rif-wg-request@w3.org]
> On Behalf Of Bijan Parsia
> Sent: 10 May 2009 17:03
> To: Sandro Hawke
> Cc: RIF WG
> Subject: Re: implementing triple stores
> 
> On 10 May 2009, at 15:08, Sandro Hawke wrote:
> 
> >> On 9 May 2009, at 13:40, Sandro Hawke wrote:
> >> [snip]
> >>> Right.  The interesting/challening part, I believe, is that Gary
> >>> wants
> >>> to translate frames to Java objects.  Doing so will require some
> >>> cleverness, since there are significant semantics/functionality
> >>> differences, but hopefully will give significant performance
gains.
> >>> Ternary predicates are typically not super fast at matching
(a,b,?).
> >>
> >> Really?
> >
> > I don't understand.  You seem to be disagreeing with my claim,
> 
> I was.
> 
> > but then,
> > below, you seem to be support it.
> 
> My understanding of what I subsequently wrote is that I was supporting
> my disagreement.
> 
> >  I'm trying to claim that a naive
> > system which just uses an ordinary rdf/3 predicate is not likely to
> be
> > able to list all the object for a given subject+predicate very
> quickly
> > (compared to a less-naive implementation.)
> 
> But not due to a fundamental property of the system, which you seemed
> to be implying. Using "isIndex" *is* using the built in predicate
> indexing system.
> 
> >>> (For example, SWI-Prolog uses a specially indexed structure for
RDF
> >>> triples/quads, because normal predicate indexing is too slow.)
> >>
> >> I wouldn't think that's the issue. Predicate indexing is typically
> >> one
> >> of the more optimized part of a Prolog engine (for obvious
> >> reasons). A
> >> uniform ternary predicate *defeats* predicate indexing because
> >> there's
> >> only one predicate. By default, IIRC and it hasn't changed, most
> >> Prolog engines do predicate plus first argument indexing, though
you
> >> can change that, e.g.,:
> >>
>
http://www.lix.polytechnique.fr/~catuscia/teaching/prolog/Manual/
> sec-3
> >> .
> >> 11.html#sec
> >> :3.11.1
> >>
>
http://www.lix.polytechnique.fr/~catuscia/teaching/prolog/Manual/
> sec-3
> >> .
> >> 12.html#index/
> >
> > I can't find a web page describing it, but from what I recall, the
> SWI
> > Prolog RDF store (which is really a quad store) has about 8 hash
> > indexes, convering different combinations of subject, predicate,
> > object,
> > and graph.
> 
> Sure. But you might want to have more hashing with a p(s, o)
> representation too (e.g., if you have lots of queries based on
> objects). I don't see that the reified representation makes a
> fundamental difference.
> 
> [snip]
> > I tend to think having a hash table per object/frame gives good
> enough
> > performance, but it is cool to be able to do various other queries
> > quickly as well.
> 
> My point was that p(s,o) vs. t(s,p,o) in a typical Prolog system is
> pretty much adding an isIndex call for the former. Thus, "naive
> performance"  isn't a reason (in those systems) for preferring one to
> the other. Performance of various queries *is* a reason for preferring
> another representation altogether.
> 
> Cheers,
> Bijan.
Received on Monday, 11 May 2009 10:31:38 UTC