- From: Richard Newman <rnewman@twinql.com>
- Date: Wed, 28 Oct 2009 23:13:49 -0700
- To: Lee Feigenbaum <lee@thefigtrees.net>
- Cc: Holger Knublauch <yahoo@knublauch.com>, SPARQL Working Group Comments <public-rdf-dawg-comments@w3.org>
>> Yes, LET assignments will (have to) be order dependent. And yes, >> this is a good thing. Sure, it may not be perfect from some >> theoretical point of view, but without ordering the whole approach >> would not work, and we would throw out the baby with the bath >> water. Even the solution with nested sub-selects is order >> dependent. Giving users the ability to specify the order in a >> reliable way has not been a problem with any other mainstream >> computer language, so why should SPARQL be different? > > SPARQL is a query language, and my understanding of previous > discussions is that there is concern that an assignment construct > turns a (mostly) declarative language into a (somewhat) imperative > language, which is (at least) a different mind set for users. Again, > I'm just repeating what I believe I've heard from WG members. I believe that a large portion of SPARQL users (maybe all of the non- experts) think procedurally when writing queries. They're not thinking about satisfying clauses, they're thinking about "fetch all the subjects with this object, then fetch all their names, then filter out the ones with...". This is why they're surprised at unexpected results, or unexpected performance: the algebraic interpretation of their queries is very different to what they think they've written. We're all far too close to RDF query languages to remember how non- implementors think. My wife is a UX person. In that field it's considered wise to never think of the user being wrong: if they've come to the incorrect conclusion, it's very likely because of something you've done or not done, and it's the software that should change, not the user. It would be interesting to run a user test of SPARQL; I'm sure we'd learn a huge amount about the assumptions and pain points of people actually trying to solve problems with it. > Also, for what it's worth, I don't think that LET need be ordered - > the Open Anzo implementation is not, and it's (nevertheless) very > useful for us. Holger's usage seems to suggest using LET for intermediate results, which at least allows efficient reuse of calculated values. That requires ordering, implicit or not. If the ordering is implicit, I guarantee that a customer will at some point ask for a "warning mode" that tells you when a variable is used before it's assigned to. You can specify behavior all you like, but that doesn't change how people think. The small set of Prolog users who are writing SPARQL will be pleased, of course :) > Also, as currently specified in our Working Drafts, subqueries are > not order dependent. Andy or Steve will correct me if I'm wrong, I'm > sure. :-) If subqueries can either draw bindings from the enclosing query, or return them back (surely both being required to make the feature useful), then strictly controlling the order of their execution would seem a smart thing to do. Imagine a remote query or subquery which returns one result if ?x is bound, or a million different ?x bindings if it's not... it's not always possible to figure out when that'll happen. "Sufficiently smart compiler" is not an adequate response. There's a continuum along which software should gracefully cede control back to the user. >> Same with FILTERs - often the query designer knows very well where >> he wants the FILTERing to take place. Why should an engine be >> required to do the re-ordering automatically and possibly mess up >> any performance expectations? But that's a separate topic :) > > FILTERs are not order dependent in SPARQL. They are attached > (conceptually) to either the optional pattern or the group pattern > in which they occur. Just to play devil's advocate for a moment: I think Holger's point is that SPARQL as specified loses a lot of the information that the query writer has encoded in the query. (He surely knows that FILTERs are not order dependent: that's what he's lamenting.) Most people do not think in an order-independent fashion, particularly when other language constructs such as OPTIONAL *are* ordered (after a fashion). I see users interspersing FILTERs throughout their queries all the time. Very often they do it because they know it's the best way to run the query. The query language then says "pull out all the FILTERs", and the implementation then has to decide how to run them... and it might not have as much information as does the user. (For example, when the execution of a custom FILTER function is very expensive, and you need to trick the planner to execute it later or earlier.) Put another way: I've never *ever* seen a user write something like SELECT * { FILTER (?name ...) ?x foaf:name ?name . ... } even though it's meaningful SPARQL. Perhaps it shouldn't be meaningful. This problem gets worse when you consider subqueries, remote queries, computed properties... Perhaps order-dependence is actually an intuitive, reasonable default for a language? Imperative programming language compilers have done a pretty good job starting with ordered statements, and figuring out when they can disregard that to get better parallelism. That's an optimization, not the default. Devil's advocacy over :) -R
Received on Thursday, 29 October 2009 06:14:20 UTC