Re: LeftJoin semantics

Thanks Andy,

So there is no error in Section 18, and the idea is that the relative order
of BGPs and OPTIONALs within the same group graph pattern can have an
effect on the solutions and an implementation should not reorder them?

It seems that there is an error in Sesame OpenRDF Workbench. I checked this
again with the newest version of OpenRDF Workbench (2.7.3) and when I add
to an empty workspace the example data:

@prefix : <http://example.org/> .

:a :p1 :b .
:a :p2 :c .
:a :p3 :d .

and then run Query 2

PREFIX :<http://example.org/>

SELECT * WHERE {
       ?s :p1 ?v1 .
       OPTIONAL {?s :p2 ?v2 } .
       ?s :p3 ?v2
}

I still get the result:

?s=<http://example.org/a>, ?v1=<http://example.org/b>, ?v2=<
http://example.org/d>

It seems that Sesame is reordering the BGPs and OPTIONALs.

I now also installed jena-fuseki (Fuseki 0.2.8), and it gives the correct
(empty) solution set.

/Esko

On Fri, Jul 26, 2013 at 10:07 AM, Andy Seaborne <andy@apache.org> wrote:

> Hi Esko,
>
>  >      Query2 should yield {?s -> :a, ?v1 -> :b, ?v2 -> :d} as its only
>  > solution (uses first and third triples).
>
> Your working of the algebra and the definitions looks right to me.
>
>  From query 2, we get to:
>
> Join( {?s -> :a, ?v1 -> :b, ?v2 -> :c} ,
>        {?s -> :a,            ?v2 -> :d} )
>
> which is the empty table (?v2 is different across the left and right
> input tables).
>
> I don't see why the answer to query2 is as you have it.
>
> That would be the different query :
>
> # Query2-A
>      PREFIX : <http://example.org/>
>      SELECT * WHERE {
>             ?s :p1 ?v1 .
>             ?s :p3 ?v2
>             OPTIONAL {?s :p2 ?v2 } .
>      }
>
> which does have answer
>
> ----------------
> | s  | v1 | v2 |
> ================
> | :a | :b | :d |
> ----------------
>
> but it's the different algebra:
>
> leftjoin(
>      bgp ( ?s :p1 ?v. ?s :p3 ?v2 )
>      bgp ( ?s :p2 ?v2) )
>
>         Andy
>
>
> On 22/07/13 15:44, Esko Nuutila wrote:
> > Hi!
> >
> > I have difficulties in understanding the evaluation of LeftJoin,
> > described in Section 18 of "SPARQL 1.1 Query Language
> > W3C Recommendation 21 March 2013"
> > (http://www.w3.org/TR/2013/REC-sparql11-query-20130321/).
> >
> > To explain the problem I use the following example data set:
> >
> > ---------------------------------
> > @prefix : <http://example.org/> .
> >
> > :a :p1 :b .
> > :a :p2 :c .
> > :a :p3 :d .
> > ---------------------------------
> >
> > and two queries
> >
> > # Query1:
> >      PREFIX : <http://example.org/>
> >      SELECT * WHERE {
> >             ?s :p1 ?v1 .
> >             OPTIONAL {?s :p2 ?v2 } .
> >      }
> >
> > # Query2:
> >      PREFIX : <http://example.org/>
> >      SELECT * WHERE {
> >             ?s :p1 ?v1 .
> >             OPTIONAL {?s :p2 ?v2 } .
> >             ?s :p3 ?v2
> >      }
> >
> > Query1 is the fifth example in Section 18.2.3. Query2 is modified
> > version of the sixth example in Section 18.2.3., where the second
> > OPTIONAL pattern is made non-optional and the variable ?v3 is replaced
> > by ?v2.
> >
> > With the example data set {:a :p1 :b . :a :p2 :c . :a :p3 :d }:
> >
> >      Query1 should yield {?s -> :a, ?v1 -> :b, ?v2 -> :c} as its only
> > solution (uses first and second triples).
> >
> >      Query2 should yield {?s -> :a, ?v1 -> :b, ?v2 -> :d} as its only
> > solution (uses first and third triples).
> >
> > I checked this with openrdf-sesame, and it produces the expected results.
> >
> > However, when I use the definitions in Section 18, I get no solutions to
> > Query2. I would like to know if I have made an error or misunderstood
> > Section 18.
> >
> > Using the translation rules in Section 18.2. we get the following SPARQL
> > algebra expressions for the group graph patterns:
> >
> > Query1:
> >
> > E1 = LeftJoin(BGP(?s :p1 ?v1),
> >                BGP(?s :p2 ?v2),
> >                true)
> >
> > Query2:
> >
> > E2 = Join(LeftJoin(BGP(?s :p1 ?v1),
> >                     BGP(?s :p2 ?v2),
> >                     true),
> >            BGP(?s :p3 ?v2))
> >
> > Thus, E2 contains E1 as a subexpression. When we are evaluating Query1,
> > the solution set of E1 should be {{?s -> :a, ?v1 -> :b, ?v2 -> :c}}.
> > However, when we are evaluating Query2, the solution set of E1 inside E2
> > should contain {?s -> :a, ?v1 -> :b} (it could contain {?s -> :a, ?v1 ->
> > :b, ?v2 -> :c} too). Thus, the evaluation of E1 should somehow make a
> > difference between these two cases. It seems to me that it does not: the
> > evaluation of LeftJoin as described in Section 18 does not care, whether
> > the LeftJoin is the outermost expression or inside another expression.
> >
> > Below are the details of the evaluation:
> >
> > Lets first evaluate Query1:
> >
> > eval(D(G), LeftJoin(BGP(?s :p1 ?v1),
> >                      BGP(?s :p2 ?v2),
> >                      true))
> >
> > [ using 'Definition: Evaluation of LeftJoin' in section 18.6 ]
> >
> > = LeftJoin(eval(D(G), BGP(?s :p1 ?v1)),
> >             eval(D(G), BGP(?s :p2 ?v2),
> >             true))
> >
> > [ using 'Definition: Evaluation of a Basic Graph Pattern' in Sections
> > 18.6 and 18.3, and using the shorthand
> >    S1 = eval(D(G), BGP(?s :p1 ?v1)) = {{?s -> :a, ?v1 -> :b}}
> >    S2 = eval(D(G), BGP(?s :p2 ?v2)) = {{?s -> :a, ?v2 -> :c}} ]
> >
> > = LeftJoin(S1, S2, true)
> >
> > [ using 'Definition: LeftJoin' in section 18.5. ]
> >
> > = Filter(true, Join(S1, S2)) union Diff(S1, S2, true)
> >
> > [ using 'Definition: Filter' in section 18.5.; filter expression is
> 'true' ]
> >
> > = Join(S1, S2) union Diff(S1, S2, true)
> >
> > [ using 'Definition: Join' in secion 18.5. ]
> >
> > = { merge(u1, u2) | u1 in S1 and u2 in S2 and u1 and u2 are compatible }
> > union Diff(S1, S2, true)
> >
> > [ since the only solutions in S1 and S2 are compatible ]
> >
> > = {{?s -> :a, ?v1 -> :b, ?v2 -> :c}} union Diff(S1, S2, true)
> >
> > [ using 'Definition: Diff' in section 18.5., and noting that the only
> > solutions in S1 and S2 are compatible, and that true <> false ]
> >
> > = {{?s -> :a, ?v1 -> :b, ?v2 -> :c}} union {}
> >
> > = {{?s -> :a, ?v1 -> :b, ?v2 -> :c}}
> >
> > Lets now evaluate Query2:
> >
> > eval(D(G), Join(LeftJoin(BGP(?s :p1 ?v1),
> >                           BGP(?s :p2 ?v2),
> >                           true),
> >                  BGP(?s :p3 ?v2)))
> >
> > [ using 'Definition: Evaluation of Join' in section 18.6 ]
> >
> > = Join(eval(D(G), LeftJoin(BGP(?s :p1 ?v1),
> >                             BGP(?s :p2 ?v2),
> >                             true)),
> >         eval(D(G), BGP(?s :p3 ?v2)))
> >
> > [ we already evaluated the first parameter of Join and got the solution
> > set {{?s -> :a, ?v1 -> :b, ?v2 -> :c}} ]
> >
> > = Join({{?s -> :a, ?v1 -> :b, ?v2 -> :c}},
> >         eval(D(G), BGP(?s :p3 ?v2)))
> >
> > [ using 'Definition: Evaluation of a Basic Graph Pattern' in Sections
> > 18.6 and 18.3 ]
> >
> > = Join({{?s -> :a, ?v1 -> :b, ?v2 -> :c}},
> >         {{?s -> :a, ?v2 -> :d}})
> >
> > [ using again 'Definition: Join' in secion 18.5.]
> >
> > = { merge(u1, u2) | u1 in {{?s -> :a, ?v1 -> :b, ?v2 -> :c}} and u2 in
> > {{?s -> :a, ?v2 -> :d}} and u1 and u2 are compatible }
> >
> > [ since the solutions {?s -> :a, ?v1 -> :b, ?v2 -> :c} and {?s -> :a,
> > ?v2 -> :d} are not compatible ]
> >
> > = {}
> >
> >
> >
> >
> > --
> > Esko Nuutila
> > Aalto University School of Science and Technology
> > Department of Computer Science and Engineering
> > P.O.Box 15400 (T-building, Konemiehentie 2, room B218)
> > FI-00076 AALTO, FINLAND
> > tel. +358 50 5750278 mailto: esko.nuutila@aalto.fi
> > <mailto:esko.nuutila@aalto.fi>
> >
>
>
>


-- 
Esko Nuutila
Aalto University School of Science and Technology
Department of Computer Science and Engineering
P.O.Box 15400 (T-building, Konemiehentie 2, room B218)
FI-00076 AALTO, FINLAND
tel. +358 50 5750278 mailto: esko.nuutila@aalto.fi

Received on Sunday, 28 July 2013 20:06:55 UTC