RE: Query and storage from Seaborne, Andy on 2002-05-24 (www-rdf-interest@w3.org from May 2002)

From: Seaborne, Andy <Andy_Seaborne@hplb.hpl.hp.com>
Date: Fri, 24 May 2002 11:03:35 +0100
To: "'Dave Reynolds'" <der@hplb.hpl.hp.com>, Dan Brickley <danbri@w3.org>
Cc: Graham Klyne <GK@ninebynine.org>, www-rdf-interest@w3.org
Message-ID: <5E13A1874524D411A876006008CD059F038D38D3@0-mail-1.hpl.hp.com>
Dave,

Surely, the performance characteristics are not a matter of the query
language per se but a matter of required results form.  The same QL could be
asked to get the minimal complete subgraph that matches (Dave's case) or
variable binding (or a sequence of graphlets - one per match possibility -
same thing).

I think of the minimal complete subgraph case is a sideways transformation
of one RDF graph into another.  RDF -> RDF.  [An complete solution to this
would also involve reformatting the RDF, creating new statements, not just
selecting them.]

The variable binding is a upwards/layering issue of the application.  It is
not in the RDF world but getting information out of the RDF, typically "find
a resource such that ..." that several QLs are targetted at.

So - in the network case this can used to split the processing: do the
RDF->RDF extraction over the wire.  Then do the creation of variable
bindings locally if that is the format the application wants.


The next issue is variables themselves.  They also constrain the things to
find.  "find the people such that person and spouse are the same age".

    ?x person:age ?age
    ?y person:age ?age
    ?x person:isWifeOf ?y	// isSpouseOf is tricky to avoid x2!

uses ?age to be the same in two places.  Shared object of two statements. I
think this needs variables (or being able to write shared bNodes in an RDF
syntax which requires id'ing bNodes in the syntax which is the same thing).

SQL with nested selects could do this and so I presume QLs that have an
explicit computation model for their intermediate results can also express
this without explicit variables.

	Andy

-----Original Message-----
From: Dave Reynolds [mailto:der@hplb.hpl.hp.com] 
Sent: 24 May 2002 10:06
To: Dan Brickley
Cc: Graham Klyne; Seaborne, Andy; www-rdf-interest@w3.org
Subject: Re: Query and storage


> > Yes, I agree.
> > (My query implementation doesn't return a subgraph at all, just the 
> > variable bindings.)
> 
> You can plug the latter into the original query to get the former. 
> Going the other way is more work, you basically have to redo the query 
> against the subgraph to get back to the bindings.

Agreed but it has performance implications.

The normal semantics of variable bindings is to return all the combinations
of variable bindings that apply. This leads to combinatoric expansion in
some cases.

To give a concrete example. I have a simple data set where there are several
properties that have multiple values (these are things like comments,
annotations and ratings by a collection of users of some set of content
items). If I just want the values of one property or the values of all
properties then reponse-as-query-binding works fine. However, suppose I want
to retrieve the value of a specific set properties, such as:
   ?x ep:comments ?c &
   ?x ep:annotations ?a &
   ?x ep:ratings ?r

In an SQL like system I should get all combinations of the binding tuple 
(?x, ?c, ?a, ?r). If there are 10 values of each property then this gives
1000 binding entries. Whereas all I'm interested in for my application is
the set of distinct property values to present to a user, of which there are
30 in this case.

This is particularly significant when doing remote query because it not only
multiplies up the server time but it multiplies up the data that needs to be
transferred.

Using a subgraph extraction approach the network transfers in cases like
this are kept small. If the client app really wanted to list all
combinations then it can do so locally by rerunning the query on the
subgraph but now it is all in local memory.

> Lots of possibilities. But I think they can all be interestingly 
> explored in the context of a simple 'graph match, return the bindings' 
> query protocol. And that the query abstraction (basically a graph 
> decorated with variable names) can be distinguished from its various 
> texty representations;  eg SQL-like and RDF/XML-based. There are many 
> many other things we might want from an rdf query language, but given 
> the state of tools, proposals etc., my inclination is to go with the 
> simplest, easiest to agree language as a basis for some 
> cross-implementation testing.

Agreed if s/return the bindings/return the subgraph/ in the networked case.
:-)

Dave
Received on Friday, 24 May 2002 06:04:04 UTC