Re: DESCRIBE optimizations (was RE: Berlin SPARQL Benchmark V2 - Results for Sesame, Virtuoso, Jena TDB, D2R Server, and MySQL) from Peter Ansell on 2008-09-25 (semantic-web@w3.org from September 2008)

From: Peter Ansell <ansell.peter@gmail.com>
Date: Thu, 25 Sep 2008 10:10:01 +1000 (EST)
To: Kjetil Kjernsmo <Kjetil.Kjernsmo@computas.com>
Cc: semantic-web@w3.org
Message-ID: <5817991.381222301389654.JavaMail.peter@Macintosh-2.local>

----- "Kjetil Kjernsmo" <Kjetil.Kjernsmo@computas.com> wrote:

> From: "Kjetil Kjernsmo" <Kjetil.Kjernsmo@computas.com>
> To: semantic-web@w3.org
> Sent: Wednesday, September 24, 2008 6:30:16 PM GMT +10:00 Brisbane
> Subject: DESCRIBE optimizations (was RE: Berlin SPARQL Benchmark V2 - Results for Sesame, Virtuoso, Jena  TDB, D2R
> Server, and MySQL)
>
> Dear all,
> 
> I would like to thank Prof. Bizer and his group for undertaking this
> very 
> interesting study, and to Orri and the participants here for
> interesting 
> elaboration.
> 
> I'm in the process of evaluating several SPARQL backends for use in
> several 
> projects, and right now, I'm looking at Virtuoso. 
> 
> I noticed this:
> >A lesser item in the same  direction is the use of describe, which is
> not
> >commensurate between SPARQL and SQL and not even between SPARQL's. 
> 
> Even though SPARQL DESCRIBE is not standardised, we have found it
> extremely 
> useful as a "give me every thing you know about </foo>" query. Also we
> found 
> it useful that it preserves the semantics of the original data. Thus,
> most of 
> the queries we ask are DESCRIBEs. Indeed, we have some performance
> issues.
> 
> I would like to hear your opinion on a possible optimisation in light
> of this:
> > The BSBM workload typically retrieves multiple dependent attributes
> of a 
> > single key.  If these attributes are all next to each other, as in a
> 
> > relational row store, then we have a constant time for the extra
> attribute 
> > instead of a log of the database size. 
> 
> This is very interesting and I can see why this is so, but we look
> upon our 
> DESCRIBEs as retrieving multiple attributes of the same single key
> (URI). 
> Thus, would it be possible to optimize for this situation somehow, by
> putting 
> these attributes next to each other?
> 
> As an aside, we don't do this a lot now, but it seems like an
> important case 
> to quickly retrieve all data of something that you know only an IFP
> of, e.g.
> 
> DESCRIBE ?user WHERE { ?user foaf:mbox "dahut@example.org" . }
> 
> Given that you know which node to DESCRIBE at the first hit if
> foaf:mbox is an 
> IFP, is this is a situation that could be optimized for?

Would that make SPARQL reliant on the definitions of IFP's? Would it be valid to return different ?user URI's if it happened that the IFP wasn't unique in a given graph and the resulting RDF contained the same as CONSTRUCT {?user foaf:mbox "dahut.example.org" . ?user ?p ?o . } WHERE { ?user foaf:mbox "dahut.example.org" . OPTIONAL { ?user ?p ?o .} } (or something like that which isn't necessarily unique as implied by DESCRIBE).

Cheers,

Peter

Received on Thursday, 25 September 2008 00:10:44 UTC