- From: Seaborne, Andy <andy.seaborne@hp.com>
- Date: Thu, 30 Sep 2004 14:10:31 +0100
- To: kendall@monkeyfist.com
- CC: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Kendall, Excellent discussion of the issues. I found this very helpful in explaining the design space. Comments below: Kendall Clark wrote: > On Wed, Sep 29, 2004 at 03:54:32PM -0500, Dan Connolly wrote: > > >>Are you persuaded to remove FROM by his comments? >> >>Based on my implementation experience, I think I wouldn't >>miss it, but I haven't thought it thru carefully. > > > I've been thinking about this in re: protocol draft. > > There is a certain bad interaction between FROM and some kinds of > protocol deployment. Since the main terms in this domain of discourse > are "queries", "models", and "query processors", I've been working on > a protocol design that is agnostic as to whether queries are sent to > URIs of models or URIs of query processor services. (Such agnosticism > also satisfies one of my design goals, which is to dictate as little > as possible about the URIspace of any conforming implementation.) > > The general cost of eliminating FROM, and thus doing model selection > in the protocol, is that queries aren't semantically complete as > written. That's not a good thing, IMO. I came up with 3 use models: 1/ Protocol (HTTP/SOAP) 2/ Local query from an application program 3/ Queries as scripts in files You cover 1/ below. I'd note that URLs encode the query so there is a single thing that can be passed around with query and target but %-encoded URLs really are opaque and uneditable. For 2/, its is Phil's argument that FROM is not needed which is possible. It can be convenient to separate the details of the query from the code (e.g. a configuration file) and having query and FROM together is convenient. 3/ is an argument for FROM because it makes a query self contained. Sometimes the script will include the FROM, sometimes it won't (i.e. be reusable on different targets). Having a query+FROM in a file is convenient for maintenance, but its just the convenience of the pairing. > > You can eliminate FROM safely -- that is, the only cost I can see is > the 'semantic completeness' general cost -- in these cases: > > 1. a query against a single model, routed either to the model itself > or to a query processor service; in the latter case, model > selection is implicitly given by the URI the query is routed to, > and in the former case, model selection is done explicitly as part > of the protocol (there are a few ways to do this, about which more > in the protocol document -- these ways are irrelevant to the FROM > clause issue). > > 2. a query against more than one model, routed to a query processor > service. In effect such a message says, "hey, query service, apply > this query to these models", except that the model selection isn't > include *in* the query proper. I'm not sure we can be agnostic because of services that offer to query one of several graphs. I think we have to decide "service-centric" or "graph-centric" for consistency in talking and writing about SPARQL. > > There are two other cases: > > 1. A query against more than one model, some of which are in a FROM > clause and others of which are identified in the protocol, which > is sent to a query processor service. This case is a bit > degenerate, in that we have to say, explicitly, in the protocol > what the effect of this case is, but it's not conceptually (or > operationally) odious. It basically says, "hey, query service, > apply this query to these models, some of which are named in the > query, others of which are named in the protocol". Not keen on that - you can't get the effect of in-protocol overriding in-query such as taking a query designed for one graph and applying it to another (e.g. a cache). > > 2. The really degenerate case, IMO, is a query against multiple > models which is routed to a model URI. This, in effect, treats a > URI identifying a model resource as if it were a URI identifying a > service. This case really offends me. :> Offends me too. > > The simplest semantic for queries where some models are identified in > a FROM clause and other models are identified in the protocol is to an > additive semantic; that is, you gather all the models identified and > treat them all as query targets. Protocol-overriding-query is also a possibility. > > My preferences, then, I think are > > 1. eliminate model selection in the query language entirely, which > means that queries aren't semantically complete in and of > themselves. We can regain completeness by treating the protocol > plus the explicit query as a query unit, but that's a bit > cheeky. > > 2. Thus, do model selection *solely* in the protocol. > > 3. Single model queries against a service: > > GET /service?<query> > > Legal. > > 4. Single model queries against the model itself: > > GET /model?<query> > > Legal. > > 5. Multi model queries against a service > > GET /model?<query> + identification of > 1 model > > Legal. > > 6. Multi model queries against a model: > > GET /model?<query> + identification of other models > > Illegal. I would disallow this one because it, in effect, is a > confused case of (5) and because I think it doesn't make sense. I agreed - this should be illegal but then I think we have to decide on service-centric vs model-centric. > > On the flip side, we can keep FROM clause, specify some resolution > mechanism for the case where some models are identified by FROM and > others in the protocol, and then just live with that. I think that's > not as elegant as eliminating it, though. A possibility is not having FROM in protocol queries (or, better, ignoring them) but keeping for local use if the local query processor wishes to provide the facility. > > The analogies and disanalogies with SQL are interesting to think > about. Our situation is fundamentally different, I think. Consider: > > query = query > table similarTo model > rdbms similarTo query processor service > > In which case, SQL does all "query target" selection in the query > language itself, and it (typically) does query processor service > selection in some kind of protocol, which differs from product to > product, as far as I know. > > Our situation is different because we treat models as first class > objects in a way that relational tables are not treated. We give > models URIs and sometimes we want to retrieve them, in toto, without > applying any query to them at all. This doesn't happen, near as I can > tell, in SQL/RDBMS technology. > > In other words, GET /model is powerful, simple, and, on its own, > rather elegant. But it does make our world less service-centric than > it might otherwise be; certainly less service-centric than SQL/RDBMS. > > That means that while SQL can reduce the domain of discourse to > queries and query processors, we talk about models, queries, and query > processors. And it's reasonable to give each of those things a URI, > depending on local custom, implementation strategy, and application > need. > Implementation experience: - Joseki ignores FROM. Multiple FROM's don't make sense. I'd assumed that aggregations are important enough to have their own URI. - Jena uses FROM if the graph is not by the calling application URIs are loaded, or databases attached, using the URI. There is a FileManager for indirection and caching between URI and graph. - Neither support multiple sources. It is a user request. > (I suspect all of this is entirely obvious and not enlightening, but > it helped me write it all down vis-a-vis working on protocol document, > so if you've read this far, congrats! :>) Its not all obvious and it is good to have it brought together into one place. Again, thank you for taking the time to write this in detail. Andy > > Cheers, > Kendall >
Received on Thursday, 30 September 2004 13:11:21 UTC