Re: [Fwd: FROM keyword unnecessary?]

Kendall,

Excellent discussion of the issues.  I found this very helpful in explaining the 
design space.

Comments below:



Kendall Clark wrote:
> On Wed, Sep 29, 2004 at 03:54:32PM -0500, Dan Connolly wrote:
> 
> 
>>Are you persuaded to remove FROM by his comments?
>>
>>Based on my implementation experience, I think I wouldn't
>>miss it, but I haven't thought it thru carefully.
> 
> 
> I've been thinking about this in re: protocol draft.
> 
> There is a certain bad interaction between FROM and some kinds of
> protocol deployment. Since the main terms in this domain of discourse
> are "queries", "models", and "query processors", I've been working on
> a protocol design that is agnostic as to whether queries are sent to
> URIs of models or URIs of query processor services. (Such agnosticism
> also satisfies one of my design goals, which is to dictate as little
> as possible about the URIspace of any conforming implementation.)
> 
> The general cost of eliminating FROM, and thus doing model selection
> in the protocol, is that queries aren't semantically complete as
> written. That's not a good thing, IMO.

I came up with 3 use models:

1/ Protocol (HTTP/SOAP)
2/ Local query from an application program
3/ Queries as scripts in files

You cover 1/ below.  I'd note that URLs encode the query so there is a single 
thing that can be passed around with query and target but %-encoded URLs really 
are opaque and uneditable.

For 2/, its is Phil's argument that FROM is not needed which is possible.  It 
can be convenient to separate the details of the query from the code (e.g. a 
configuration file) and having query and FROM together is convenient.

3/ is an argument for FROM because it makes a query self contained.  Sometimes 
the script will include the FROM, sometimes it won't (i.e. be reusable on 
different targets).  Having a query+FROM in a file is convenient for 
maintenance, but its just the convenience of the pairing.

> 
> You can eliminate FROM safely -- that is, the only cost I can see is
> the 'semantic completeness' general cost -- in these cases:
> 
>  1. a query against a single model, routed either to the model itself
>     or to a query processor service; in the latter case, model
>     selection is implicitly given by the URI the query is routed to,
>     and in the former case, model selection is done explicitly as part
>     of the protocol (there are a few ways to do this, about which more
>     in the protocol document -- these ways are irrelevant to the FROM
>     clause issue).
> 
>  2. a query against more than one model, routed to a query processor
>     service. In effect such a message says, "hey, query service, apply
>     this query to these models", except that the model selection isn't
>     include *in* the query proper.

I'm not sure we can be agnostic because of services that offer to query one of 
several graphs.  I think we have to decide "service-centric" or "graph-centric" 
for consistency in talking and writing about SPARQL.

> 
> There are two other cases:
> 
>  1. A query against more than one model, some of which are in a FROM
>     clause and others of which are identified in the protocol, which
>     is sent to a query processor service. This case is a bit
>     degenerate, in that we have to say, explicitly, in the protocol
>     what the effect of this case is, but it's not conceptually (or
>     operationally) odious. It basically says, "hey, query service,
>     apply this query to these models, some of which are named in the
>     query, others of which are named in the protocol".

Not keen on that - you can't get the effect of in-protocol overriding in-query 
such as taking a query designed for one graph and applying it to another (e.g. a 
cache).

> 
>  2. The really degenerate case, IMO, is a query against multiple
>     models which is routed to a model URI. This, in effect, treats a
>     URI identifying a model resource as if it were a URI identifying a
>     service. This case really offends me. :>

Offends me too.

> 
> The simplest semantic for queries where some models are identified in
> a FROM clause and other models are identified in the protocol is to an
> additive semantic; that is, you gather all the models identified and
> treat them all as query targets.

Protocol-overriding-query is also a possibility.

> 
> My preferences, then, I think are 
> 
> 1. eliminate model selection in the query language entirely, which
>    means that queries aren't semantically complete in and of
>    themselves. We can regain completeness by treating the protocol
>    plus the explicit query as a query unit, but that's a bit
>    cheeky.
> 
> 2. Thus, do model selection *solely* in the protocol.
> 
> 3. Single model queries against a service:
> 
>    GET /service?<query>
> 
>    Legal.
> 
> 4. Single model queries against the model itself:
> 
>    GET /model?<query>
> 
>    Legal.
> 
> 5. Multi model queries against a service
> 
>    GET /model?<query> + identification of > 1 model
> 
>    Legal.   
> 
> 6. Multi model queries against a model:
> 
>    GET /model?<query> + identification of other models
> 
>    Illegal. I would disallow this one because it, in effect, is a
>    confused case of (5) and because I think it doesn't make sense.

I agreed - this should be illegal but then I think we have to decide on 
service-centric vs model-centric.

> 
> On the flip side, we can keep FROM clause, specify some resolution
> mechanism for the case where some models are identified by FROM and
> others in the protocol, and then just live with that. I think that's
> not as elegant as eliminating it, though.

A possibility is not having FROM in protocol queries (or, better, ignoring them) 
but keeping for local use if the local query processor wishes to provide the 
facility.

> 
> The analogies and disanalogies with SQL are interesting to think
> about. Our situation is fundamentally different, I think. Consider:
> 
> query = query
> table similarTo model
> rdbms similarTo query processor service
> 
> In which case, SQL does all "query target" selection in the query
> language itself, and it (typically) does query processor service
> selection in some kind of protocol, which differs from product to
> product, as far as I know.
> 
> Our situation is different because we treat models as first class
> objects in a way that relational tables are not treated. We give
> models URIs and sometimes we want to retrieve them, in toto, without
> applying any query to them at all. This doesn't happen, near as I can
> tell, in SQL/RDBMS technology.
> 
> In other words, GET /model is powerful, simple, and, on its own,
> rather elegant. But it does make our world less service-centric than
> it might otherwise be; certainly less service-centric than SQL/RDBMS.
> 
> That means that while SQL can reduce the domain of discourse to
> queries and query processors, we talk about models, queries, and query
> processors. And it's reasonable to give each of those things a URI,
> depending on local custom, implementation strategy, and application
> need.
> 

Implementation experience:

- Joseki ignores FROM.  Multiple FROM's don't make sense.  I'd assumed
   that aggregations are important enough to have their own URI.

- Jena uses FROM if the graph is not by the calling application
   URIs are loaded, or databases attached, using the URI.
   There is a FileManager for indirection and caching between URI and graph.

- Neither support multiple sources.
   It is a user request.


> (I suspect all of this is entirely obvious and not enlightening, but
> it helped me write it all down vis-a-vis working on protocol document,
> so if you've read this far, congrats! :>)

Its not all obvious and it is good to have it brought together into one place.

Again, thank you for taking the time to write this in detail.

	Andy


> 
> Cheers,
> Kendall
> 

Received on Thursday, 30 September 2004 13:11:21 UTC