Re: Missing LET (Assignment) in SPARQL 1.1 from Holger Knublauch on 2009-10-30 (public-rdf-dawg-comments@w3.org from October 2009)

From: Holger Knublauch <holger@topquadrant.com>
Date: Fri, 30 Oct 2009 11:12:05 -0700
To: SPARQL Working Group Comments <public-rdf-dawg-comments@w3.org>
Message-Id: <3E855A1D-3B7A-4578-B746-1321409C21EA@topquadrant.com>
On Oct 28, 2009, at 11:13 PM, Richard Newman wrote:
> I believe that a large portion of SPARQL users (maybe all of the non- 
> experts) think procedurally when writing queries. They're not  
> thinking about satisfying clauses, they're thinking about "fetch all  
> the subjects with this object, then fetch all their names, then  
> filter out the ones with...".
>
> This is why they're surprised at unexpected results, or unexpected  
> performance: the algebraic interpretation of their queries is very  
> different to what they think they've written.
>
> We're all far too close to RDF query languages to remember how non- 
> implementors think.
>
> My wife is a UX person. In that field it's considered wise to never  
> think of the user being wrong: if they've come to the incorrect  
> conclusion, it's very likely because of something you've done or not  
> done, and it's the software that should change, not the user. It  
> would be interesting to run a user test of SPARQL; I'm sure we'd  
> learn a huge amount about the assumptions and pain points of people  
> actually trying to solve problems with it.

I could not agree more with this. But the problems go further than  
just perception. Even if I had fully understood all the details of the  
SPARQL algebra, I still do not know what kind of reorderings will  
happen inside the query engine. The engine may reorder FILTERs because  
of some heuristics. These heuristics may be unsuitable, or they may be  
misinformed because statistical data about the triple store is not  
always available. In those cases, the system should just use the order  
in which the user has specified the clauses.

> I think Holger's point is that SPARQL as specified loses a lot of  
> the information that the query writer has encoded in the query. (He  
> surely knows that FILTERs are not order dependent: that's what he's  
> lamenting.)
>
> Most people do not think in an order-independent fashion,  
> particularly when other language constructs such as OPTIONAL *are*  
> ordered (after a fashion).
>
> I see users interspersing FILTERs throughout their queries all the  
> time. Very often they do it because they know it's the best way to  
> run the query. The query language then says "pull out all the  
> FILTERs", and the implementation then has to decide how to run  
> them... and it might not have as much information as does the user.  
> (For example, when the execution of a custom FILTER function is very  
> expensive, and you need to trick the planner to execute it later or  
> earlier.)
>
> Put another way: I've never *ever* seen a user write something like
>
>  SELECT * {
>    FILTER (?name ...)
>    ?x foaf:name ?name .
>    ...
>  }
>
> even though it's meaningful SPARQL. Perhaps it shouldn't be  
> meaningful.
>
> This problem gets worse when you consider subqueries, remote  
> queries, computed properties...
>
> Perhaps order-dependence is actually an intuitive, reasonable  
> default for a language? Imperative programming language compilers  
> have done a pretty good job starting with ordered statements, and  
> figuring out when they can disregard that to get better parallelism.  
> That's an optimization, not the default.
>
> Devil's advocacy over :)

+1

In order to not break backward compatibility (albeit debatable), maybe  
a new keyword such as SELECT ... WHERE PROCEDURAL { ... } (not a nice  
name yet, but you get the idea) could be introduced to help the engine  
decide whether a query has been written by someone with a procedural  
background or by someone who comes from the SQL world, and who expects  
the engine to do the reordering for him.

Holger
Received on Friday, 30 October 2009 18:12:51 UTC