Re: Query and storage from Dan Brickley on 2002-05-23 (www-rdf-interest@w3.org from May 2002)

From: Dan Brickley <danbri@w3.org>
Date: Thu, 23 May 2002 13:02:47 -0400 (EDT)
To: Graham Klyne <GK@ninebynine.org>
cc: "Seaborne, Andy" <Andy_Seaborne@hplb.hpl.hp.com>, <www-rdf-interest@w3.org>
Message-ID: <Pine.LNX.4.30.0205231248040.14820-100000@tux.w3.org>
On Thu, 23 May 2002, Graham Klyne wrote:

> At 11:53 AM 5/23/02 +0100, Seaborne, Andy wrote:
> > > a higher level than "find this pattern of triples"
> >
> >Agreed.  There are two problems that are closely related by sharing
> >technology but are different use models.  Query-variable bindings is a
> >matter of one layer of the application wanting to ask questions of the RDF
> >graph ("find the resource such that ...") and the extract subgraph that is a
> >matter of RDF->RDF transformation by restricting one graph.  These two seem
> >to get mixed up.
>
> Yes, I agree.
> (My query implementation doesn't return a subgraph at all, just the
> variable bindings.)

You can plug the latter into the original query to get the former. Going
the other way is more work, you basically have to redo the query against
the subgraph to get back to the bindings.

The other nice thing about focusing on the bindings is that it is a very
familiar (SQL-ish hence 'Squish...') programming idiom. Send some database
a query string, get back a bunch of answers, with rows for 'hits' and
columns for fields. I'm not sure how far the analogy can be pushed, but
Libby had her Inkling/Squish stuff implemented over the JDBC APIs. I'm
trying same with Ruby DBI and as a SOAP service (soap-encoding
serialization of an array of hashtables; a quick hack). This simple-minded
approach to query isn't an ideal/perfect mechanism for querying RDF data
services, but its a common, widely implemented subset.  Worthy of some
writeup and interop testing, I reckon.

> > > I'd like to see more work on storage formats before we nail down a query
> >language.
> >
> >This is where I disagree: I don't want to see a relationship between the
> >query language and the storage.  I think query should be specified in
> >relation to the RDF graph.  It would be different implementations for
> >different application domains that make decisions about storage and query
> >*implementation*.  There is no need to bind storage choices to QL choices.
>
> I agree with what you say here, but maybe I should clarify what I meant.  I
> didn't mean that the query language should be bound to a storage format.
>
> Rather, I was thinking about the efficiency of higher-level query
> constructs;  my own implementation is modelled on the idea of matching
> tree-shaped query subgraphs against an arbitrary RDF graph.  My intuition
> here is that this should permit more efficient handling of the
> query.  Working with a Jena-like interface, the first thing I do to
> implement this is break it down into a collection of triples to be matched,
> so on that score I don't seem to have made any useful progress.  (To set
> against that, I was encouraged that the implementation seems to be
> constrained to conduct the graph query in much the same way that I would do
> if programming it by hand.)

That's how I did my first (accidental) query implementation. I first wrote
out longhand the Ruby code for calling the triple-match API, then started
mechanising it. Not particularly efficient. While query _languages_ don't
need to know much about the backend, there are many things about the
backend (and its specific contents) that we'll want to expose to query
engines. Most obvious case is a backend that is itself capable of handling
complex query languages; but also we'll want to know about indices the
database might have, stats of various kinds, whether the database is
'smart' w.r.t. datatypes, substring searching, various kinds of inference etc.

Lots of possibilities. But I think they can all be interestingly explored
in the context of a simple 'graph match, return the bindings' query
protocol. And that the query abstraction (basically a graph decorated with
variable names) can be distinguished from its various texty
representations;  eg SQL-like and RDF/XML-based. There are many many other
things we might want from an rdf query language, but given the state of
tools, proposals etc., my inclination is to go with the simplest, easiest
to agree language as a basis for some cross-implementation testing.

Dan


-- 
mailto:danbri@w3.org
http://www.w3.org/People/DanBri/
Received on Thursday, 23 May 2002 13:02:51 UTC