- From: Richard Newman <rnewman@twinql.com>
- Date: Wed, 28 Oct 2009 23:13:49 -0700
- To: Lee Feigenbaum <lee@thefigtrees.net>
- Cc: Holger Knublauch <yahoo@knublauch.com>, SPARQL Working Group Comments <public-rdf-dawg-comments@w3.org>
>> Yes, LET assignments will (have to) be order dependent. And yes,
>> this is a good thing. Sure, it may not be perfect from some
>> theoretical point of view, but without ordering the whole approach
>> would not work, and we would throw out the baby with the bath
>> water. Even the solution with nested sub-selects is order
>> dependent. Giving users the ability to specify the order in a
>> reliable way has not been a problem with any other mainstream
>> computer language, so why should SPARQL be different?
>
> SPARQL is a query language, and my understanding of previous
> discussions is that there is concern that an assignment construct
> turns a (mostly) declarative language into a (somewhat) imperative
> language, which is (at least) a different mind set for users. Again,
> I'm just repeating what I believe I've heard from WG members.
I believe that a large portion of SPARQL users (maybe all of the non-
experts) think procedurally when writing queries. They're not thinking
about satisfying clauses, they're thinking about "fetch all the
subjects with this object, then fetch all their names, then filter out
the ones with...".
This is why they're surprised at unexpected results, or unexpected
performance: the algebraic interpretation of their queries is very
different to what they think they've written.
We're all far too close to RDF query languages to remember how non-
implementors think.
My wife is a UX person. In that field it's considered wise to never
think of the user being wrong: if they've come to the incorrect
conclusion, it's very likely because of something you've done or not
done, and it's the software that should change, not the user. It would
be interesting to run a user test of SPARQL; I'm sure we'd learn a
huge amount about the assumptions and pain points of people actually
trying to solve problems with it.
> Also, for what it's worth, I don't think that LET need be ordered -
> the Open Anzo implementation is not, and it's (nevertheless) very
> useful for us.
Holger's usage seems to suggest using LET for intermediate results,
which at least allows efficient reuse of calculated values. That
requires ordering, implicit or not.
If the ordering is implicit, I guarantee that a customer will at some
point ask for a "warning mode" that tells you when a variable is used
before it's assigned to. You can specify behavior all you like, but
that doesn't change how people think.
The small set of Prolog users who are writing SPARQL will be pleased,
of course :)
> Also, as currently specified in our Working Drafts, subqueries are
> not order dependent. Andy or Steve will correct me if I'm wrong, I'm
> sure. :-)
If subqueries can either draw bindings from the enclosing query, or
return them back (surely both being required to make the feature
useful), then strictly controlling the order of their execution would
seem a smart thing to do. Imagine a remote query or subquery which
returns one result if ?x is bound, or a million different ?x bindings
if it's not... it's not always possible to figure out when that'll
happen.
"Sufficiently smart compiler" is not an adequate response. There's a
continuum along which software should gracefully cede control back to
the user.
>> Same with FILTERs - often the query designer knows very well where
>> he wants the FILTERing to take place. Why should an engine be
>> required to do the re-ordering automatically and possibly mess up
>> any performance expectations? But that's a separate topic :)
>
> FILTERs are not order dependent in SPARQL. They are attached
> (conceptually) to either the optional pattern or the group pattern
> in which they occur.
Just to play devil's advocate for a moment:
I think Holger's point is that SPARQL as specified loses a lot of the
information that the query writer has encoded in the query. (He surely
knows that FILTERs are not order dependent: that's what he's lamenting.)
Most people do not think in an order-independent fashion, particularly
when other language constructs such as OPTIONAL *are* ordered (after a
fashion).
I see users interspersing FILTERs throughout their queries all the
time. Very often they do it because they know it's the best way to run
the query. The query language then says "pull out all the FILTERs",
and the implementation then has to decide how to run them... and it
might not have as much information as does the user. (For example,
when the execution of a custom FILTER function is very expensive, and
you need to trick the planner to execute it later or earlier.)
Put another way: I've never *ever* seen a user write something like
SELECT * {
FILTER (?name ...)
?x foaf:name ?name .
...
}
even though it's meaningful SPARQL. Perhaps it shouldn't be meaningful.
This problem gets worse when you consider subqueries, remote queries,
computed properties...
Perhaps order-dependence is actually an intuitive, reasonable default
for a language? Imperative programming language compilers have done a
pretty good job starting with ordered statements, and figuring out
when they can disregard that to get better parallelism. That's an
optimization, not the default.
Devil's advocacy over :)
-R
Received on Thursday, 29 October 2009 06:14:20 UTC