Re: [Fwd: FROM keyword unnecessary?] from Kendall Clark on 2004-09-29 (public-rdf-dawg@w3.org from July to September 2004)

From: Kendall Clark <kendall@monkeyfist.com>
Date: Wed, 29 Sep 2004 18:46:16 -0400
To: Dan Connolly <connolly@w3.org>
Cc: Andy Seaborne <andy.seaborne@hp.com>, "'RDF Data Access Working Group'" <public-rdf-dawg@w3.org>
Message-ID: <20040929224616.GA21965@monkeyfist.com>
On Wed, Sep 29, 2004 at 03:54:32PM -0500, Dan Connolly wrote:

> Are you persuaded to remove FROM by his comments?
> 
> Based on my implementation experience, I think I wouldn't
> miss it, but I haven't thought it thru carefully.

I've been thinking about this in re: protocol draft.

There is a certain bad interaction between FROM and some kinds of
protocol deployment. Since the main terms in this domain of discourse
are "queries", "models", and "query processors", I've been working on
a protocol design that is agnostic as to whether queries are sent to
URIs of models or URIs of query processor services. (Such agnosticism
also satisfies one of my design goals, which is to dictate as little
as possible about the URIspace of any conforming implementation.)

The general cost of eliminating FROM, and thus doing model selection
in the protocol, is that queries aren't semantically complete as
written. That's not a good thing, IMO.

You can eliminate FROM safely -- that is, the only cost I can see is
the 'semantic completeness' general cost -- in these cases:

 1. a query against a single model, routed either to the model itself
    or to a query processor service; in the latter case, model
    selection is implicitly given by the URI the query is routed to,
    and in the former case, model selection is done explicitly as part
    of the protocol (there are a few ways to do this, about which more
    in the protocol document -- these ways are irrelevant to the FROM
    clause issue).

 2. a query against more than one model, routed to a query processor
    service. In effect such a message says, "hey, query service, apply
    this query to these models", except that the model selection isn't
    include *in* the query proper.

There are two other cases:

 1. A query against more than one model, some of which are in a FROM
    clause and others of which are identified in the protocol, which
    is sent to a query processor service. This case is a bit
    degenerate, in that we have to say, explicitly, in the protocol
    what the effect of this case is, but it's not conceptually (or
    operationally) odious. It basically says, "hey, query service,
    apply this query to these models, some of which are named in the
    query, others of which are named in the protocol".

 2. The really degenerate case, IMO, is a query against multiple
    models which is routed to a model URI. This, in effect, treats a
    URI identifying a model resource as if it were a URI identifying a
    service. This case really offends me. :>

The simplest semantic for queries where some models are identified in
a FROM clause and other models are identified in the protocol is to an
additive semantic; that is, you gather all the models identified and
treat them all as query targets.

My preferences, then, I think are 

1. eliminate model selection in the query language entirely, which
   means that queries aren't semantically complete in and of
   themselves. We can regain completeness by treating the protocol
   plus the explicit query as a query unit, but that's a bit
   cheeky.

2. Thus, do model selection *solely* in the protocol.

3. Single model queries against a service:

   GET /service?<query>

   Legal.

4. Single model queries against the model itself:

   GET /model?<query>

   Legal.

5. Multi model queries against a service

   GET /model?<query> + identification of > 1 model

   Legal.   

6. Multi model queries against a model:

   GET /model?<query> + identification of other models

   Illegal. I would disallow this one because it, in effect, is a
   confused case of (5) and because I think it doesn't make sense.

On the flip side, we can keep FROM clause, specify some resolution
mechanism for the case where some models are identified by FROM and
others in the protocol, and then just live with that. I think that's
not as elegant as eliminating it, though.

The analogies and disanalogies with SQL are interesting to think
about. Our situation is fundamentally different, I think. Consider:

query = query
table similarTo model
rdbms similarTo query processor service

In which case, SQL does all "query target" selection in the query
language itself, and it (typically) does query processor service
selection in some kind of protocol, which differs from product to
product, as far as I know.

Our situation is different because we treat models as first class
objects in a way that relational tables are not treated. We give
models URIs and sometimes we want to retrieve them, in toto, without
applying any query to them at all. This doesn't happen, near as I can
tell, in SQL/RDBMS technology.

In other words, GET /model is powerful, simple, and, on its own,
rather elegant. But it does make our world less service-centric than
it might otherwise be; certainly less service-centric than SQL/RDBMS.

That means that while SQL can reduce the domain of discourse to
queries and query processors, we talk about models, queries, and query
processors. And it's reasonable to give each of those things a URI,
depending on local custom, implementation strategy, and application
need.

(I suspect all of this is entirely obvious and not enlightening, but
it helped me write it all down vis-a-vis working on protocol document,
so if you've read this far, congrats! :>)

Cheers,
Kendall
Received on Wednesday, 29 September 2004 22:48:45 UTC