LeftJoin semantics from Esko Nuutila on 2013-07-22 (public-rdf-dawg-comments@w3.org from July 2013)

From: Esko Nuutila <esko.nuutila@aalto.fi>
Date: Mon, 22 Jul 2013 17:44:18 +0300
To: public-rdf-dawg-comments@w3.org
Message-ID: <CACeuo11_nqLjRYQAFdy9ObKXpoOTyoQ=r+GXt6fKv698WhdGtw@mail.gmail.com>
Hi!

I have difficulties in understanding the evaluation of LeftJoin, described
in Section 18 of "SPARQL 1.1 Query Language
W3C Recommendation 21 March 2013" (
http://www.w3.org/TR/2013/REC-sparql11-query-20130321/).

To explain the problem I use the following example data set:

---------------------------------
@prefix : <http://example.org/> .

:a :p1 :b .
:a :p2 :c .
:a :p3 :d .
---------------------------------

and two queries

# Query1:
    PREFIX : <http://example.org/>
    SELECT * WHERE {
           ?s :p1 ?v1 .
           OPTIONAL {?s :p2 ?v2 } .
    }

# Query2:
    PREFIX : <http://example.org/>
    SELECT * WHERE {
           ?s :p1 ?v1 .
           OPTIONAL {?s :p2 ?v2 } .
           ?s :p3 ?v2
    }

Query1 is the fifth example in Section 18.2.3. Query2 is modified version
of the sixth example in Section 18.2.3., where the second OPTIONAL pattern
is made non-optional and the variable ?v3 is replaced by ?v2.

With the example data set {:a :p1 :b . :a :p2 :c . :a :p3 :d }:

    Query1 should yield {?s -> :a, ?v1 -> :b, ?v2 -> :c} as its only
solution (uses first and second triples).

    Query2 should yield {?s -> :a, ?v1 -> :b, ?v2 -> :d} as its only
solution (uses first and third triples).

I checked this with openrdf-sesame, and it produces the expected results.

However, when I use the definitions in Section 18, I get no solutions to
Query2. I would like to know if I have made an error or misunderstood
Section 18.

Using the translation rules in Section 18.2. we get the following SPARQL
algebra expressions for the group graph patterns:

Query1:

E1 = LeftJoin(BGP(?s :p1 ?v1),
              BGP(?s :p2 ?v2),
              true)

Query2:

E2 = Join(LeftJoin(BGP(?s :p1 ?v1),
                   BGP(?s :p2 ?v2),
                   true),
          BGP(?s :p3 ?v2))

Thus, E2 contains E1 as a subexpression. When we are evaluating Query1, the
solution set of E1 should be {{?s -> :a, ?v1 -> :b, ?v2 -> :c}}. However,
when we are evaluating Query2, the solution set of E1 inside E2 should
contain {?s -> :a, ?v1 -> :b} (it could contain {?s -> :a, ?v1 -> :b, ?v2
-> :c} too). Thus, the evaluation of E1 should somehow make a difference
between these two cases. It seems to me that it does not: the evaluation of
LeftJoin as described in Section 18 does not care, whether the LeftJoin is
the outermost expression or inside another expression.

Below are the details of the evaluation:

Lets first evaluate Query1:

eval(D(G), LeftJoin(BGP(?s :p1 ?v1),
                    BGP(?s :p2 ?v2),
                    true))

[ using 'Definition: Evaluation of LeftJoin' in section 18.6 ]

= LeftJoin(eval(D(G), BGP(?s :p1 ?v1)),
           eval(D(G), BGP(?s :p2 ?v2),
           true))

[ using 'Definition: Evaluation of a Basic Graph Pattern' in Sections 18.6
and 18.3, and using the shorthand
  S1 = eval(D(G), BGP(?s :p1 ?v1)) = {{?s -> :a, ?v1 -> :b}}
  S2 = eval(D(G), BGP(?s :p2 ?v2)) = {{?s -> :a, ?v2 -> :c}} ]

= LeftJoin(S1, S2, true)

[ using 'Definition: LeftJoin' in section 18.5. ]

= Filter(true, Join(S1, S2)) union Diff(S1, S2, true)

[ using 'Definition: Filter' in section 18.5.; filter expression is 'true' ]

= Join(S1, S2) union Diff(S1, S2, true)

[ using 'Definition: Join' in secion 18.5. ]

= { merge(u1, u2) | u1 in S1 and u2 in S2 and u1 and u2 are compatible }
union Diff(S1, S2, true)

[ since the only solutions in S1 and S2 are compatible ]

= {{?s -> :a, ?v1 -> :b, ?v2 -> :c}} union Diff(S1, S2, true)

[ using 'Definition: Diff' in section 18.5., and noting that the only
solutions in S1 and S2 are compatible, and that true <> false ]

= {{?s -> :a, ?v1 -> :b, ?v2 -> :c}} union {}

= {{?s -> :a, ?v1 -> :b, ?v2 -> :c}}

Lets now evaluate Query2:

eval(D(G), Join(LeftJoin(BGP(?s :p1 ?v1),
                         BGP(?s :p2 ?v2),
                         true),
                BGP(?s :p3 ?v2)))

[ using 'Definition: Evaluation of Join' in section 18.6 ]

= Join(eval(D(G), LeftJoin(BGP(?s :p1 ?v1),
                           BGP(?s :p2 ?v2),
                           true)),
       eval(D(G), BGP(?s :p3 ?v2)))

[ we already evaluated the first parameter of Join and got the solution set
{{?s -> :a, ?v1 -> :b, ?v2 -> :c}} ]

= Join({{?s -> :a, ?v1 -> :b, ?v2 -> :c}},
       eval(D(G), BGP(?s :p3 ?v2)))

[ using 'Definition: Evaluation of a Basic Graph Pattern' in Sections 18.6
and 18.3 ]

= Join({{?s -> :a, ?v1 -> :b, ?v2 -> :c}},
       {{?s -> :a, ?v2 -> :d}})

[ using again 'Definition: Join' in secion 18.5.]

= { merge(u1, u2) | u1 in {{?s -> :a, ?v1 -> :b, ?v2 -> :c}} and u2 in {{?s
-> :a, ?v2 -> :d}} and u1 and u2 are compatible }

[ since the solutions {?s -> :a, ?v1 -> :b, ?v2 -> :c} and {?s -> :a, ?v2
-> :d} are not compatible ]

= {}




-- 
Esko Nuutila
Aalto University School of Science and Technology
Department of Computer Science and Engineering
P.O.Box 15400 (T-building, Konemiehentie 2, room B218)
FI-00076 AALTO, FINLAND
tel. +358 50 5750278 mailto: esko.nuutila@aalto.fi
Received on Wednesday, 24 July 2013 17:08:55 UTC