Re: semantics of OPTIONAL/LeftJoin

Dear Andy,

On 19 May 2014, at 14:23, Andy Seaborne <andy@apache.org> wrote:
> Axel,
> 
> So we understand exactly the interpretation of the text, could you write out the evaluation steps as you understand it for teh query, except changing as below to remove the projection that is not related to your comment.
> 
> PREFIX : <http://ex.org/>
> SELECT * { ?X  :age ?A
>                OPTIONAL { ?X :email ?M
>                           FILTER (?A > 25 ) }
>              }
> 
> 
> > Note that further "or Omega2 is empty" in the second branch of the
> > UNION is redundant, since the case where Omega2 is
> > empty is covered already by "*forall*  mu2 in Omega2 [...]"
> 
> In response to a comment (SPARQL 1.0), the "or Omega2 is empty" for clarity - I agree it's not necessary.
>  
> Andy


Thanks for pushing us into recomposing the example into the corresponsing algebra expression... 
indeed we discovered by that  that our initial example was wrong. Here's the corrected version:
in fact, the different behaviour shows when you swap the BGPs, i.e.

Query:

PREFIX : <http://ex.org/>
SELECT * { ?X  :email ?M  OPTIONAL { ?X :age ?A FILTER (?A > 25 ) } }

The corresponding algebra should be:

LeftJoin(
 BGP(?X  :email ?M),
 BGP(?X :age ?A),
 ?A > 25
)

Let's now check this on the following data (also reducing the original example to it's essence):

@prefix : <http://ex.org/> .
:a :age 20,30; :email "foo".


we get:

BGP(?X <http://ex.org/email> ?M) = { (?X -> :a, ?M -> "foo") }

BGP(?X <http://ex.org/age> ?A) = { (?X -> :a, ?A -> 20),
       (?X -> :a, ?A -> 30) }


Let therefore

O1 = { (?X -> :a, ?M -> "foo") },
O2 = { (?X -> :a, ?A -> 20), (?X -> :a, ?A -> 30) } and
F = ?A > 25

Then we get for the evaluation

  LeftJoin( O1, O2, F)

And now we have two different ways of evaluating this.

By the definition according to the translation using Diff we get

LeftJoin(O1,O2,F) = Filter(F, Join(O1,O2)) union Diff(O1,O2,F)

which in turn gives

O3 = Join(O1,O2) = { (?X -> :a, ?M -> "foo", ?A -> 20),
                     (?X -> :a, ?M -> "foo", ?A -> 30)}

and thus

Filter(F, O3) = { (?X -> :a, ?M -> "foo", ?A -> 30) }

for the "Diff" part we get

Diff(O1,O2,expr) = {}

because for the only mapping m1 in O1, we have the mapping m2 = (?X -> :a, ?A -> 30) 
in O2 that is compatible with m1 and F(merge(m1,m2)) does not have 
an effective boolean value of false.

Thus, as the overall result we get

LeftJoin( O1, O2, F) = { (?X -> :a, ?M -> "foo", ?A -> 30) }  (1)


Now lets have a look at the definition according to the "written in full":

LeftJoin(O1, O2, F) =
  { merge(m1, m2) | m1 in O1 and m2 in O2, m1 and m2 are compatible  
   and expr(merge(μ1, μ2)) is true }  (a)
  union
  { m1 | m1 in O1, for all m2 in O2, m1 and m2 are not compatible, or
   O2 is empty } (b)
  union
  { m1 | m1 in O1, exists m2 in O2, m1 and m2 are compatible and
   F(merge(m1, m2)) is false. } (c)

Clearly, (b) gives the emptyset. Moreover, (a) gives the mapping
(?X -> :a, ?M -> "foo", ?A -> 30).
Hoever, (c) gives the mapping
(?X -> :a, ?M -> "foo" )

Thus, as overall result we get

LeftJoin(O1, O2, F) = { (?X -> :a, ?M -> "foo", ?A -> 30),
   (?X -> :a, ?M -> "foo" )}  (2)

and clearly (1) != (2).

Attached the modified example for your convenience.
The proposed fix/erratum stays as we suggested in our orignal mail for changing the "written out" definition to:

>> ------------------------------
>> LeftJoin(Ω1, Ω2, expr) =
>>   { merge(mu1, mu2) | mu1 in Omega1 and mu2 in Omega2, mu1 and mu2 are compatible and expr(merge(μ1, μ2)) is true }
>> UNION
>>   { μ1 | μ1 in Ω1, *forall* mu2 in Omega2, either mu1 and mu2 are not compatible
>>                                            or mu1 and mu2 are compatible and expr(merge(mu1, mu2)) is false }
>> ------------------------------




best regards, 
Sebastian & Axel 
> 
> On 19/05/14 10:24, Axel Polleres wrote:
>> Dear all,
>> 
>> I am just discussing with a colleague from TU (Sebastian Skritek (cc:ed) something which we
>> believe might need an erratum in our Spec:
>> 
>> What Sebastian discovered seems to be a mismatch between the definition of the
>> LeftJoin algebra operator  (http://www.w3.org/TR/sparql11-query/#defn_algLeftJoin)
>> and the "Written in full that is:” text below that Definition box in the spec.
>> 
>> If we look at the definition in the box it says:
>> ---------
>> LeftJoin(O1, O2, expr) =
>>    Filter(expr, Join(O1, O2)) UNION Diff(O1, O2, expr)
>> ---------
>> 
>> Apart from the cases where O2 is empty, or results in sub-mappings of mappings in O1,
>> we cannot get a mapping from O1 alone as result, according to this definition.
>> 
>> That is, the only chance to get a mapping from O1 as result is that it occcurs in Diff(O1, O2, expr).
>> 
>> Now let us look at the definition of Diff (http://www.w3.org/TR/sparql11-query/#defn_algDiff)
>> 
>> Diff(O1, O2, expr) = { mu | mu in O1 such that *for all* mu' in O2,
>>  *either* mu and mu' are not compatible
>>  *or* mu and mu' are compatible and expr(merge(mu, mu')) has an effective boolean value of false }
>> 
>> So, this means that, for a mapping from O1 to be a result of  LeftJoin(O1, O2, expr)
>> either there is a mapping mu’ in O2 that is a sub-mapping of mu in O1, or
>> mu in O1 is such that there is NO mapping mu' in O2 that is compatible with mu where the filter does NOT evaluate to false.
>> 
>> Now this seems to be in contrast with the "written out" Definition that follows below the Definition:
>> 
>> ------------------------------
>> LeftJoin(Ω1, Ω2, expr) =
>>   { merge(mu1, mu2) | mu1 in Omega1 and mu2 in Omega2, mu1 and mu2 are compatible and expr(merge(μ1, μ2)) is true }
>> UNION
>>   { μ1 | μ1 in Ω1, *forall* mu2 in Omega2, mu1 and mu2 are not compatible, or Omega2 is empty }
>> UNION
>>   { μ1 | μ1 in Ω1, *exists* mu2 in Omega2, mu1 and mu2 are compatible and expr(merge(mu1, mu2)) is false. }
>> ———————————————
>> 
>> The problematic line here is the last one, as illustrated by the following example:
>> 
>> Data:
>> 
>> @prefix : <http://ex.org/> .
>> 
>> :a :age 20,30; :email "foo".
>> :b :age 30; :email "bar".
>> 
>> 
>> Query:
>> 
>> PREFIX : <http://ex.org/>
>> SELECT ?X ?M { ?X  :age ?A
>>               OPTIONAL { ?X :email ?M
>>                          FILTER (?A > 25 ) }
>>             }
>> 
>> 
>> According to the "official" definition of Diff, this should - in our opinion - be:
>> 
>> --------------
>> | X  | M     |
>> ==============
>> | :b | "bar" |
>> | :a | "foo" |
>> --------------
>> 
>> whereas, according to the latter definition, this would result in:
>> 
>> --------------
>> | X  | M     |
>> ==============
>> | :b | "bar" |
>> | :a | "foo" |
>> | :a |       |
>> --------------
>> 
>> IMO, the latter is wrong.
>> 
>> 
>> The proposed fix to align the  "written out" definition with the
>> definition of Diff would be as follows:
>> 
>> 
>> ------------------------------
>> LeftJoin(Ω1, Ω2, expr) =
>>   { merge(mu1, mu2) | mu1 in Omega1 and mu2 in Omega2, mu1 and mu2 are compatible and expr(merge(μ1, μ2)) is true }
>> UNION
>>   { μ1 | μ1 in Ω1, *forall* mu2 in Omega2, either mu1 and mu2 are not compatible
>>                                            or mu1 and mu2 are compatible and expr(merge(mu1, mu2)) is false }
>> ------------------------------
>> 
>> Note that further "or Omega2 is empty" in the second branch of the UNION is redundant, since the case where Omega2 is
>> empty is covered already by "*forall*  mu2 in Omega2 [...]"
>> 
>> 
>> We haven't checked in detail yet whether this would affect test cases, but I would certainly
>> suggest to add - in case - the example to test cases, at least in for any future version of SPARQL.
>> We attach the test file for your convenience.
>> 
>> best regards,
>> Axel & Sebastian
>> 
>> --
>> Prof. Dr. Axel Polleres
>> Institute for Information Business, WU Vienna
>> url: http://www.polleres.net/  twitter: @AxelPolleres

> 
>  Andy
> 
> On 19/05/14 10:24, Axel Polleres wrote:
>> Dear all,
>> 
>> I am just discussing with a colleague from TU (Sebastian Skritek (cc:ed) something which we
>> believe might need an erratum in our Spec:
>> 
>> What Sebastian discovered seems to be a mismatch between the definition of the
>> LeftJoin algebra operator  (http://www.w3.org/TR/sparql11-query/#defn_algLeftJoin)
>> and the "Written in full that is:” text below that Definition box in the spec.
>> 
>> If we look at the definition in the box it says:
>> ---------
>> LeftJoin(O1, O2, expr) =
>>     Filter(expr, Join(O1, O2)) UNION Diff(O1, O2, expr)
>> ---------
>> 
>> Apart from the cases where O2 is empty, or results in sub-mappings of mappings in O1,
>> we cannot get a mapping from O1 alone as result, according to this definition.
>> 
>> That is, the only chance to get a mapping from O1 as result is that it occcurs in Diff(O1, O2, expr).
>> 
>> Now let us look at the definition of Diff (http://www.w3.org/TR/sparql11-query/#defn_algDiff)
>> 
>> Diff(O1, O2, expr) = { mu | mu in O1 such that *for all* mu' in O2,
>>   *either* mu and mu' are not compatible
>>   *or* mu and mu' are compatible and expr(merge(mu, mu')) has an effective boolean value of false }
>> 
>> So, this means that, for a mapping from O1 to be a result of  LeftJoin(O1, O2, expr)
>> either there is a mapping mu’ in O2 that is a sub-mapping of mu in O1, or
>> mu in O1 is such that there is NO mapping mu' in O2 that is compatible with mu where the filter does NOT evaluate to false.
>> 
>> Now this seems to be in contrast with the "written out" Definition that follows below the Definition:
>> 
>> ------------------------------
>> LeftJoin(Ω1, Ω2, expr) =
>>    { merge(mu1, mu2) | mu1 in Omega1 and mu2 in Omega2, mu1 and mu2 are compatible and expr(merge(μ1, μ2)) is true }
>> UNION
>>    { μ1 | μ1 in Ω1, *forall* mu2 in Omega2, mu1 and mu2 are not compatible, or Omega2 is empty }
>> UNION
>>    { μ1 | μ1 in Ω1, *exists* mu2 in Omega2, mu1 and mu2 are compatible and expr(merge(mu1, mu2)) is false. }
>> ———————————————
>> 
>> The problematic line here is the last one, as illustrated by the following example:
>> 
>> Data:
>> 
>> @prefix : <http://ex.org/> .
>> 
>> :a :age 20,30; :email "foo".
>> :b :age 30; :email "bar".
>> 
>> 
>> Query:
>> 
>> PREFIX : <http://ex.org/>
>> SELECT ?X ?M { ?X  :age ?A
>>                OPTIONAL { ?X :email ?M
>>                           FILTER (?A > 25 ) }
>>              }
>> 
>> 
>> According to the "official" definition of Diff, this should - in our opinion - be:
>> 
>> --------------
>> | X  | M     |
>> ==============
>> | :b | "bar" |
>> | :a | "foo" |
>> --------------
>> 
>> whereas, according to the latter definition, this would result in:
>> 
>> --------------
>> | X  | M     |
>> ==============
>> | :b | "bar" |
>> | :a | "foo" |
>> | :a |       |
>> --------------
>> 
>> IMO, the latter is wrong.
>> 
>> 
>> The proposed fix to align the  "written out" definition with the
>> definition of Diff would be as follows:
>> 
>> 
>> ------------------------------
>> LeftJoin(Ω1, Ω2, expr) =
>>    { merge(mu1, mu2) | mu1 in Omega1 and mu2 in Omega2, mu1 and mu2 are compatible and expr(merge(μ1, μ2)) is true }
>> UNION
>>    { μ1 | μ1 in Ω1, *forall* mu2 in Omega2, either mu1 and mu2 are not compatible
>>                                             or mu1 and mu2 are compatible and expr(merge(mu1, mu2)) is false }
>> ------------------------------
>> 
>> Note that further "or Omega2 is empty" in the second branch of the UNION is redundant, since the case where Omega2 is
>> empty is covered already by "*forall*  mu2 in Omega2 [...]"
>> 
>> 
>> We haven't checked in detail yet whether this would affect test cases, but I would certainly
>> suggest to add - in case - the example to test cases, at least in for any future version of SPARQL.
>> We attach the test file for your convenience.
>> 
>> best regards,
>> Axel & Sebastian
>> 
>> --
>> Prof. Dr. Axel Polleres
>> Institute for Information Business, WU Vienna
>> url: http://www.polleres.net/  twitter: @AxelPolleres
>> 
> 
> 

Received on Thursday, 22 May 2014 07:27:24 UTC