Re: LeftJoin semantics from Andy Seaborne on 2013-07-26 (public-rdf-dawg-comments@w3.org from July 2013)

From: Andy Seaborne <andy@apache.org>
Date: Fri, 26 Jul 2013 08:07:34 +0100
To: public-rdf-dawg-comments@w3.org
Message-ID: <51F22036.8030308@apache.org>
Hi Esko,

 >      Query2 should yield {?s -> :a, ?v1 -> :b, ?v2 -> :d} as its only
 > solution (uses first and third triples).

Your working of the algebra and the definitions looks right to me.

 From query 2, we get to:

Join( {?s -> :a, ?v1 -> :b, ?v2 -> :c} ,
       {?s -> :a,            ?v2 -> :d} )

which is the empty table (?v2 is different across the left and right 
input tables).

I don't see why the answer to query2 is as you have it.

That would be the different query :

# Query2-A
     PREFIX : <http://example.org/>
     SELECT * WHERE {
            ?s :p1 ?v1 .
            ?s :p3 ?v2
            OPTIONAL {?s :p2 ?v2 } .
     }

which does have answer

----------------
| s  | v1 | v2 |
================
| :a | :b | :d |
----------------

but it's the different algebra:

leftjoin(
     bgp ( ?s :p1 ?v. ?s :p3 ?v2 )
     bgp ( ?s :p2 ?v2) )

	Andy


On 22/07/13 15:44, Esko Nuutila wrote:
> Hi!
>
> I have difficulties in understanding the evaluation of LeftJoin,
> described in Section 18 of "SPARQL 1.1 Query Language
> W3C Recommendation 21 March 2013"
> (http://www.w3.org/TR/2013/REC-sparql11-query-20130321/).
>
> To explain the problem I use the following example data set:
>
> ---------------------------------
> @prefix : <http://example.org/> .
>
> :a :p1 :b .
> :a :p2 :c .
> :a :p3 :d .
> ---------------------------------
>
> and two queries
>
> # Query1:
>      PREFIX : <http://example.org/>
>      SELECT * WHERE {
>             ?s :p1 ?v1 .
>             OPTIONAL {?s :p2 ?v2 } .
>      }
>
> # Query2:
>      PREFIX : <http://example.org/>
>      SELECT * WHERE {
>             ?s :p1 ?v1 .
>             OPTIONAL {?s :p2 ?v2 } .
>             ?s :p3 ?v2
>      }
>
> Query1 is the fifth example in Section 18.2.3. Query2 is modified
> version of the sixth example in Section 18.2.3., where the second
> OPTIONAL pattern is made non-optional and the variable ?v3 is replaced
> by ?v2.
>
> With the example data set {:a :p1 :b . :a :p2 :c . :a :p3 :d }:
>
>      Query1 should yield {?s -> :a, ?v1 -> :b, ?v2 -> :c} as its only
> solution (uses first and second triples).
>
>      Query2 should yield {?s -> :a, ?v1 -> :b, ?v2 -> :d} as its only
> solution (uses first and third triples).
>
> I checked this with openrdf-sesame, and it produces the expected results.
>
> However, when I use the definitions in Section 18, I get no solutions to
> Query2. I would like to know if I have made an error or misunderstood
> Section 18.
>
> Using the translation rules in Section 18.2. we get the following SPARQL
> algebra expressions for the group graph patterns:
>
> Query1:
>
> E1 = LeftJoin(BGP(?s :p1 ?v1),
>                BGP(?s :p2 ?v2),
>                true)
>
> Query2:
>
> E2 = Join(LeftJoin(BGP(?s :p1 ?v1),
>                     BGP(?s :p2 ?v2),
>                     true),
>            BGP(?s :p3 ?v2))
>
> Thus, E2 contains E1 as a subexpression. When we are evaluating Query1,
> the solution set of E1 should be {{?s -> :a, ?v1 -> :b, ?v2 -> :c}}.
> However, when we are evaluating Query2, the solution set of E1 inside E2
> should contain {?s -> :a, ?v1 -> :b} (it could contain {?s -> :a, ?v1 ->
> :b, ?v2 -> :c} too). Thus, the evaluation of E1 should somehow make a
> difference between these two cases. It seems to me that it does not: the
> evaluation of LeftJoin as described in Section 18 does not care, whether
> the LeftJoin is the outermost expression or inside another expression.
>
> Below are the details of the evaluation:
>
> Lets first evaluate Query1:
>
> eval(D(G), LeftJoin(BGP(?s :p1 ?v1),
>                      BGP(?s :p2 ?v2),
>                      true))
>
> [ using 'Definition: Evaluation of LeftJoin' in section 18.6 ]
>
> = LeftJoin(eval(D(G), BGP(?s :p1 ?v1)),
>             eval(D(G), BGP(?s :p2 ?v2),
>             true))
>
> [ using 'Definition: Evaluation of a Basic Graph Pattern' in Sections
> 18.6 and 18.3, and using the shorthand
>    S1 = eval(D(G), BGP(?s :p1 ?v1)) = {{?s -> :a, ?v1 -> :b}}
>    S2 = eval(D(G), BGP(?s :p2 ?v2)) = {{?s -> :a, ?v2 -> :c}} ]
>
> = LeftJoin(S1, S2, true)
>
> [ using 'Definition: LeftJoin' in section 18.5. ]
>
> = Filter(true, Join(S1, S2)) union Diff(S1, S2, true)
>
> [ using 'Definition: Filter' in section 18.5.; filter expression is 'true' ]
>
> = Join(S1, S2) union Diff(S1, S2, true)
>
> [ using 'Definition: Join' in secion 18.5. ]
>
> = { merge(u1, u2) | u1 in S1 and u2 in S2 and u1 and u2 are compatible }
> union Diff(S1, S2, true)
>
> [ since the only solutions in S1 and S2 are compatible ]
>
> = {{?s -> :a, ?v1 -> :b, ?v2 -> :c}} union Diff(S1, S2, true)
>
> [ using 'Definition: Diff' in section 18.5., and noting that the only
> solutions in S1 and S2 are compatible, and that true <> false ]
>
> = {{?s -> :a, ?v1 -> :b, ?v2 -> :c}} union {}
>
> = {{?s -> :a, ?v1 -> :b, ?v2 -> :c}}
>
> Lets now evaluate Query2:
>
> eval(D(G), Join(LeftJoin(BGP(?s :p1 ?v1),
>                           BGP(?s :p2 ?v2),
>                           true),
>                  BGP(?s :p3 ?v2)))
>
> [ using 'Definition: Evaluation of Join' in section 18.6 ]
>
> = Join(eval(D(G), LeftJoin(BGP(?s :p1 ?v1),
>                             BGP(?s :p2 ?v2),
>                             true)),
>         eval(D(G), BGP(?s :p3 ?v2)))
>
> [ we already evaluated the first parameter of Join and got the solution
> set {{?s -> :a, ?v1 -> :b, ?v2 -> :c}} ]
>
> = Join({{?s -> :a, ?v1 -> :b, ?v2 -> :c}},
>         eval(D(G), BGP(?s :p3 ?v2)))
>
> [ using 'Definition: Evaluation of a Basic Graph Pattern' in Sections
> 18.6 and 18.3 ]
>
> = Join({{?s -> :a, ?v1 -> :b, ?v2 -> :c}},
>         {{?s -> :a, ?v2 -> :d}})
>
> [ using again 'Definition: Join' in secion 18.5.]
>
> = { merge(u1, u2) | u1 in {{?s -> :a, ?v1 -> :b, ?v2 -> :c}} and u2 in
> {{?s -> :a, ?v2 -> :d}} and u1 and u2 are compatible }
>
> [ since the solutions {?s -> :a, ?v1 -> :b, ?v2 -> :c} and {?s -> :a,
> ?v2 -> :d} are not compatible ]
>
> = {}
>
>
>
>
> --
> Esko Nuutila
> Aalto University School of Science and Technology
> Department of Computer Science and Engineering
> P.O.Box 15400 (T-building, Konemiehentie 2, room B218)
> FI-00076 AALTO, FINLAND
> tel. +358 50 5750278 mailto: esko.nuutila@aalto.fi
> <mailto:esko.nuutila@aalto.fi>
>
Received on Friday, 26 July 2013 07:08:06 UTC