Re: ARQ implementation of LET from Lee Feigenbaum on 2010-07-08 (public-rdf-dawg@w3.org from July to September 2010)

From: Lee Feigenbaum <lee@thefigtrees.net>
Date: Thu, 08 Jul 2010 11:00:09 -0400
To: Andy Seaborne <andy.seaborne@talis.com>
CC: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <4C35E7F9.2030705@thefigtrees.net>
Thanks, Andy. This makes it very easy for me to summarize what Glitter 
does. Please see below.

On 7/8/2010 10:10 AM, Andy Seaborne wrote:
> In ARQ "LET (?v := expr)" should be read informally as "in any solution,
> ?v must be the value of the expression". The rules for execution are
> based on this and allow for ?v to already be some value, or some value
> later in the query. Read this way, it's not a fixed assignment. It does
> not replace any already-set value with another.
>
>
> The execution rules are for a simple entailment graph:
>
> 1/ If the variable is unbound, and the expression evaluates, the
> variable is bound to the value.
> 2/ If the variable is bound to the same value as the expression
> evaluates, nothing happens and the query continues.
> 3/ If the variable is bound to a different value as the expression
> evaluates, an error occurs and the current solution will be excluded
> from the results.
> 4/ If the expression does not evaluate (e.g. unbound variable in the
> expression), no assignment occurs and the query continues.

Glitter shares rules 1-3. Regarding rule 4, an error in the LET 
expression currently causes an error in the whole query, but this is not 
by design. I would prefer a design that either shares ARQ's rule 4 or 
that discards the solution.

The way I think about this myself (but believe it is the same is):

* Solve a group - this gives me a solution set
* For each solution, S and assignment (V := E), evaluate E with S as the 
environment, and then join S with with (V -> eval(E, S)).

> Rule 1 is the case of simply adding a column to the results - the direct
> way to meet the criteria of having the required value.
>
> Rules 2 & 3 deal with the case of the binding already being defined;
> rule 3 stops the replacing of one value by another.
>
> Rule 4 is the error case.
>
> Note that "same value" here means the same as applies to graph pattern
> matching, for whatever the capabilities of the engine are, not to FILTER
> expressions, so for a simple entailment graph, that means "same term".
> For a engine providing understanding of numbers, it means "=".
>
> The rules mean that there is some order independence, and hence it's
> like (as I understand it) what Glitter does by considering all LET's to
> happen logically at the end of a BGP, after pattern matching and before
> FILTERs. This maximises the variables in scope.
>
> Assuming :p and :q always have a numeric value and execution is naively
> in the order the pattern is written:
>
> { ?s :p ?o . LET (?o1 = ?o +1) . ?s :q ?o1 }
>
> { ?s :p ?o . ?s :q ?o1 . LET (?o1 = ?o+1) }
>
> have the same effect. Because of rules 2&3 and because ?o1 is used in a
> pattern as well, it is also the same effect as:
>
> { ?s :p ?o . ?s :q ?o1 . FILTER (sameTerm(?o1,?o+1) }

This is all true for me also.

> where setting ?o1 is driven from the BGP matching. This happens to give
> optimizers some opportunities. Related: several system optimize {?s :p
> ?x FILTER(?x=<uri>) } by executing {?s :p <uri> } and adding ?x = <uri>
> to the results.
>
>
> We need to decide the best choice for the feature in SPARQL 1.1 - I'm
> not suggesting that the rules above are necessarily the best or only
> choice.
>
> I'm not sure (4) is the best choice - it may be more consistent to be
> like a FILTER and eliminate the row like FILTER(error) does. I chose the
> way it so that a mistake only caused an empty cell, not the loss of a
> row, which makes debugging easier.
>
>
> The most common use case that I've seen is to introduce a new variable
> and assign a value to it, as part of the overall results. In that way,
> it's like SELECT expressions but less cumbersome. Users seemto find it
> natural to say "and also put in a column for ?x where the value ...". In
> this case, the variable introduced isn't used again and a syntactic rule
> of the variable must not have been mentioned yet, makes it exactly like
> SELECT expressions (which do have that rule as a static syntax
> condition, not on a per-row basis).
>
> Another use case is including calculating an intermediate value that is
> used several times elsewhere in the query, maybe a FILTER twice
> (a simple matter of it being easier to write the expression once).

For me, a common use case is discriminated unions:

{
   ...
   {
      ... ?x ...
      LET (?branch := "foo")
   } UNION {
      ... ?x ...
      LET (?branch := "bar")
   } UNION {
      ... ?x ...
      LET (?branch := "baz")
   }
   ...
}

Lee

>
> Andy
>
> PS As far as I can see, SET is the word more generally used in SQL, not
> LET. It's used for several things:
>
> Server settings:
> http://www.postgresql.org/docs/8.1/interactive/sql-set.html
> http://dev.mysql.com/doc/refman/5.1/en/set-option.html
>
> Stored procedure/user variables:
> http://dev.mysql.com/doc/refman/5.1/en/set-statement.html
> http://msdn.microsoft.com/en-us/library/aa259193%28SQL.80%29.aspx
>
>
>
Received on Thursday, 8 July 2010 15:00:46 UTC