Re: ARQ implementation of LET

Sorry for getting to this late.

Like Lee, I'll base my response on Andy's description.

Mulgara's LET implementation is quite trivial, so many of the
semantics simply emerged out of the implementation rather than being
designed.

On Thu, Jul 8, 2010 at 10:10 AM, Andy Seaborne <andy.seaborne@talis.com> wrote:
> In ARQ "LET (?v := expr)" should be read informally as "in any solution, ?v
> must be the value of the expression".  The rules for execution are based on
> this and allow for ?v to already be some value, or some value later in the
> query.  Read this way, it's not a fixed assignment.  It does not replace any
> already-set value with another.

Mulgara's LET adds a new variable to the current part of the query. If
a variable is already bound, then an exception is reported. So it
doesn't replace an already set value with another, but nor does it
reduce the results (this would not be a difficult change, and it would
make Mulgara much more consistent).

>
> The execution rules are for a simple entailment graph:
>
> 1/ If the variable is unbound, and the expression evaluates, the variable is
> bound to the value.

This is the same.

> 2/ If the variable is bound to the same value as the expression evaluates,
> nothing happens and the query continues.
> 3/ If the variable is bound to a different value as the expression
> evaluates, an error occurs and the current solution will be excluded from
> the results.

In these cases the query reports an error. This is inconsistent with
the case where the variable is introduced earlier than a binding
introduced through a BGP, so I will be changing it to match Andy's
description.

> 4/ If the expression does not evaluate (e.g. unbound variable in the
> expression), no assignment occurs and the query continues.

This is the same.

> Rule 1 is the case of simply adding a column to the results - the direct way
> to meet the criteria of having the required value.
>
> Rules 2 & 3 deal with the case of the binding already being defined; rule 3
> stops the replacing of one value by another.
>
> Rule 4 is the error case.
>
> Note that "same value" here means the same as applies to graph pattern
> matching, for whatever the capabilities of the engine are, not to FILTER
> expressions, so for a simple entailment graph, that means "same term". For a
> engine providing understanding of numbers, it means "=".
>
> The rules mean that there is some order independence, and hence it's like
> (as I understand it) what Glitter does by considering all LET's to happen
> logically at the end of a BGP, after pattern matching and before FILTERs.
>  This maximises the variables in scope.

I was aware that Mulgara had some order dependence, which I didn't
really like and had planned to fix. However, all of the use cases I
had were for introducing new variables rather than overriding existing
ones, so it was not a real concern for me. Now that it's looking like
it could happen for real, I'll make sure I get this in order.

> Assuming :p and :q always have a numeric value and execution is naively in
> the order the pattern is written:
>
> { ?s :p ?o . LET (?o1 = ?o +1) . ?s :q ?o1 }
>
> { ?s :p ?o . ?s :q ?o1 . LET (?o1 = ?o+1) }
>
> have the same effect.  Because of rules 2&3 and because ?o1 is used in a
> pattern as well, it is also the same effect as:
>
> { ?s :p ?o . ?s :q ?o1 . FILTER (sameTerm(?o1,?o+1) }
>
> where setting ?o1 is driven from the BGP matching.  This happens to give
> optimizers some opportunities.  Related: several system optimize {?s :p ?x
> FILTER(?x=<uri>) } by executing {?s :p <uri> } and adding ?x = <uri> to the
> results.

Incidentally, rather than upset Mulgara's tests for already bound
variables my approach to this problem is to use a filter constraint.

> We need to decide the best choice for the feature in SPARQL 1.1 - I'm not
> suggesting that the rules above are necessarily the best or only choice.
>
> I'm not sure (4) is the best choice - it may be more consistent to be like a
> FILTER and eliminate the row like FILTER(error) does.  I chose the way it so
> that a mistake only caused an empty cell, not the loss of a row, which makes
> debugging easier.

Actually, I like (4) and use it myself. For instance, I may want to
concatenate first and last names into a variable, but those names may
be OPTIONAL. By following (4) I can apply the LET either inside or
outside of the OPTIONAL operation. This is particularly useful for
re-using the code for SELECT expressions.


> The most common use case that I've seen is to introduce a new variable and
> assign a value to it, as part of the overall results.  In that way, it's
> like SELECT expressions but less cumbersome.  Users seemto find it natural
> to say "and also put in a column for ?x where the value ...". In this case,
> the variable introduced isn't used again and a syntactic rule of the
> variable must not have been mentioned yet, makes it exactly like SELECT
> expressions (which do have that rule as a static syntax condition, not on a
> per-row basis).

This is exactly the use case I get. It's also the reason why (2) and
(3) had not been issues for me before now.

> Another use case is including calculating an intermediate value that is used
> several times elsewhere in the query, maybe a FILTER twice
> (a simple matter of it being easier to write the expression once).

This is a use case I've run into (both for myself and users), though
only for FILTERS. It does work for joining to a BGP, but then I have
the ordering issues I've already discussed.

>        Andy
>
> PS As far as I can see, SET is the word more generally used in SQL, not LET.
>  It's used for several things:
>
> Server settings:
> http://www.postgresql.org/docs/8.1/interactive/sql-set.html
> http://dev.mysql.com/doc/refman/5.1/en/set-option.html
>
> Stored procedure/user variables:
> http://dev.mysql.com/doc/refman/5.1/en/set-statement.html
> http://msdn.microsoft.com/en-us/library/aa259193%28SQL.80%29.aspx

Is there a particular reason why people don't like "LET"? Users
respond well to it.

Paul

Received on Wednesday, 21 July 2010 15:54:05 UTC