Re: Query and storage from Dave Reynolds on 2002-05-24 (www-rdf-interest@w3.org from May 2002)

From: Dave Reynolds <der@hplb.hpl.hp.com>
Date: Fri, 24 May 2002 10:06:12 +0100
To: Dan Brickley <danbri@w3.org>
CC: Graham Klyne <GK@ninebynine.org>, "Seaborne, Andy" <Andy_Seaborne@hplb.hpl.hp.com>, www-rdf-interest@w3.org
Message-ID: <3CEE0284.D45934C1@hplb.hpl.hp.com>

> > Yes, I agree.
> > (My query implementation doesn't return a subgraph at all, just the
> > variable bindings.)
> 
> You can plug the latter into the original query to get the former. Going
> the other way is more work, you basically have to redo the query against
> the subgraph to get back to the bindings.

Agreed but it has performance implications.

The normal semantics of variable bindings is to return all the combinations of
variable bindings that apply. This leads to combinatoric expansion in some
cases.

To give a concrete example. I have a simple data set where there are several
properties that have multiple values (these are things like comments,
annotations and ratings by a collection of users of some set of content items).
If I just want the values of one property or the values of all properties then
reponse-as-query-binding works fine. However, suppose I want to retrieve the
value of a specific set properties, such as:
   ?x ep:comments ?c &
   ?x ep:annotations ?a &
   ?x ep:ratings ?r

In an SQL like system I should get all combinations of the binding tuple 
(?x, ?c, ?a, ?r). If there are 10 values of each property then this gives 1000
binding entries. Whereas all I'm interested in for my application is the set of
distinct property values to present to a user, of which there are 30 in this
case.

This is particularly significant when doing remote query because it not only
multiplies up the server time but it multiplies up the data that needs to be
transferred.

Using a subgraph extraction approach the network transfers in cases like this
are kept small. If the client app really wanted to list all combinations then it
can do so locally by rerunning the query on the subgraph but now it is all in
local memory.

> Lots of possibilities. But I think they can all be interestingly explored
> in the context of a simple 'graph match, return the bindings' query
> protocol. And that the query abstraction (basically a graph decorated with
> variable names) can be distinguished from its various texty
> representations;  eg SQL-like and RDF/XML-based. There are many many other
> things we might want from an rdf query language, but given the state of
> tools, proposals etc., my inclination is to go with the simplest, easiest
> to agree language as a basis for some cross-implementation testing.

Agreed if s/return the bindings/return the subgraph/ in the networked case. :-)

Dave

Received on Friday, 24 May 2002 05:06:20 UTC