Re: Missing LET (Assignment) in SPARQL 1.1 from Richard Newman on 2009-10-29 (public-rdf-dawg-comments@w3.org from October 2009)

From: Richard Newman <rnewman@twinql.com>
Date: Wed, 28 Oct 2009 23:13:49 -0700
To: Lee Feigenbaum <lee@thefigtrees.net>
Cc: Holger Knublauch <yahoo@knublauch.com>, SPARQL Working Group Comments <public-rdf-dawg-comments@w3.org>
Message-Id: <1C33AA19-7A8A-441B-8CB4-12F40B721C39@twinql.com>
>> Yes, LET assignments will (have to) be order dependent. And yes,  
>> this is a good thing. Sure, it may not be perfect from some  
>> theoretical point of view, but without ordering the whole approach  
>> would not work, and we would throw out the baby with the bath  
>> water. Even the solution with nested sub-selects is order  
>> dependent. Giving users the ability to specify the order in a  
>> reliable way has not been a problem with any other mainstream  
>> computer language, so why should SPARQL be different?
>
> SPARQL is a query language, and my understanding of previous  
> discussions is that there is concern that an assignment construct  
> turns a (mostly) declarative language into a (somewhat) imperative  
> language, which is (at least) a different mind set for users. Again,  
> I'm just repeating what I believe I've heard from WG members.

I believe that a large portion of SPARQL users (maybe all of the non- 
experts) think procedurally when writing queries. They're not thinking  
about satisfying clauses, they're thinking about "fetch all the  
subjects with this object, then fetch all their names, then filter out  
the ones with...".

This is why they're surprised at unexpected results, or unexpected  
performance: the algebraic interpretation of their queries is very  
different to what they think they've written.

We're all far too close to RDF query languages to remember how non- 
implementors think.

My wife is a UX person. In that field it's considered wise to never  
think of the user being wrong: if they've come to the incorrect  
conclusion, it's very likely because of something you've done or not  
done, and it's the software that should change, not the user. It would  
be interesting to run a user test of SPARQL; I'm sure we'd learn a  
huge amount about the assumptions and pain points of people actually  
trying to solve problems with it.


> Also, for what it's worth, I don't think that LET need be ordered -  
> the Open Anzo implementation is not, and it's (nevertheless) very  
> useful for us.

Holger's usage seems to suggest using LET for intermediate results,  
which at least allows efficient reuse of calculated values. That  
requires ordering, implicit or not.

If the ordering is implicit, I guarantee that a customer will at some  
point ask for a "warning mode" that tells you when a variable is used  
before it's assigned to. You can specify behavior all you like, but  
that doesn't change how people think.

The small set of Prolog users who are writing SPARQL will be pleased,  
of course :)


> Also, as currently specified in our Working Drafts, subqueries are  
> not order dependent. Andy or Steve will correct me if I'm wrong, I'm  
> sure. :-)

If subqueries can either draw bindings from the enclosing query, or  
return them back (surely both being required to make the feature  
useful), then strictly controlling the order of their execution would  
seem a smart thing to do. Imagine a remote query or subquery which  
returns one result if ?x is bound, or a million different ?x bindings  
if it's not... it's not always possible to figure out when that'll  
happen.

"Sufficiently smart compiler" is not an adequate response. There's a  
continuum along which software should gracefully cede control back to  
the user.


>> Same with FILTERs - often the query designer knows very well where  
>> he wants the FILTERing to take place. Why should an engine be  
>> required to do the re-ordering automatically and possibly mess up  
>> any performance expectations? But that's a separate topic :)
>
> FILTERs are not order dependent in SPARQL. They are attached  
> (conceptually) to either the optional pattern or the group pattern  
> in which they occur.

Just to play devil's advocate for a moment:

I think Holger's point is that SPARQL as specified loses a lot of the  
information that the query writer has encoded in the query. (He surely  
knows that FILTERs are not order dependent: that's what he's lamenting.)

Most people do not think in an order-independent fashion, particularly  
when other language constructs such as OPTIONAL *are* ordered (after a  
fashion).

I see users interspersing FILTERs throughout their queries all the  
time. Very often they do it because they know it's the best way to run  
the query. The query language then says "pull out all the FILTERs",  
and the implementation then has to decide how to run them... and it  
might not have as much information as does the user. (For example,  
when the execution of a custom FILTER function is very expensive, and  
you need to trick the planner to execute it later or earlier.)

Put another way: I've never *ever* seen a user write something like

   SELECT * {
     FILTER (?name ...)
     ?x foaf:name ?name .
     ...
   }

even though it's meaningful SPARQL. Perhaps it shouldn't be meaningful.

This problem gets worse when you consider subqueries, remote queries,  
computed properties...

Perhaps order-dependence is actually an intuitive, reasonable default  
for a language? Imperative programming language compilers have done a  
pretty good job starting with ordered statements, and figuring out  
when they can disregard that to get better parallelism. That's an  
optimization, not the default.

Devil's advocacy over :)

-R
Received on Thursday, 29 October 2009 06:14:20 UTC