semantics of OPTIONAL/LeftJoin

Dear all,

I am just discussing with a colleague from TU (Sebastian Skritek (cc:ed) something which we 
believe might need an erratum in our Spec:

What Sebastian discovered seems to be a mismatch between the definition of the 
LeftJoin algebra operator  (http://www.w3.org/TR/sparql11-query/#defn_algLeftJoin)
and the "Written in full that is:” text below that Definition box in the spec.

If we look at the definition in the box it says:
---------
LeftJoin(O1, O2, expr) =
    Filter(expr, Join(O1, O2)) UNION Diff(O1, O2, expr)
---------

Apart from the cases where O2 is empty, or results in sub-mappings of mappings in O1, 
we cannot get a mapping from O1 alone as result, according to this definition.

That is, the only chance to get a mapping from O1 as result is that it occcurs in Diff(O1, O2, expr).

Now let us look at the definition of Diff (http://www.w3.org/TR/sparql11-query/#defn_algDiff)

Diff(O1, O2, expr) = { mu | mu in O1 such that *for all* mu' in O2, 
  *either* mu and mu' are not compatible 
  *or* mu and mu' are compatible and expr(merge(mu, mu')) has an effective boolean value of false }

So, this means that, for a mapping from O1 to be a result of  LeftJoin(O1, O2, expr) 
either there is a mapping mu’ in O2 that is a sub-mapping of mu in O1, or 
mu in O1 is such that there is NO mapping mu' in O2 that is compatible with mu where the filter does NOT evaluate to false.

Now this seems to be in contrast with the "written out" Definition that follows below the Definition:

------------------------------
LeftJoin(Ω1, Ω2, expr) =
   { merge(mu1, mu2) | mu1 in Omega1 and mu2 in Omega2, mu1 and mu2 are compatible and expr(merge(μ1, μ2)) is true }
UNION
   { μ1 | μ1 in Ω1, *forall* mu2 in Omega2, mu1 and mu2 are not compatible, or Omega2 is empty }
UNION
   { μ1 | μ1 in Ω1, *exists* mu2 in Omega2, mu1 and mu2 are compatible and expr(merge(mu1, mu2)) is false. }
———————————————

The problematic line here is the last one, as illustrated by the following example:

Data:

@prefix : <http://ex.org/> .

:a :age 20,30; :email "foo".
:b :age 30; :email "bar".


Query:

PREFIX : <http://ex.org/>
SELECT ?X ?M { ?X  :age ?A 
               OPTIONAL { ?X :email ?M 
                          FILTER (?A > 25 ) }
             }


According to the "official" definition of Diff, this should - in our opinion - be:

--------------
| X  | M     |
==============
| :b | "bar" |
| :a | "foo" |
--------------

whereas, according to the latter definition, this would result in:

--------------
| X  | M     |
==============
| :b | "bar" |
| :a | "foo" |
| :a |       |
--------------

IMO, the latter is wrong.


The proposed fix to align the  "written out" definition with the 
definition of Diff would be as follows:


------------------------------
LeftJoin(Ω1, Ω2, expr) =
   { merge(mu1, mu2) | mu1 in Omega1 and mu2 in Omega2, mu1 and mu2 are compatible and expr(merge(μ1, μ2)) is true }
UNION
   { μ1 | μ1 in Ω1, *forall* mu2 in Omega2, either mu1 and mu2 are not compatible 
                                            or mu1 and mu2 are compatible and expr(merge(mu1, mu2)) is false }
------------------------------

Note that further "or Omega2 is empty" in the second branch of the UNION is redundant, since the case where Omega2 is 
empty is covered already by "*forall*  mu2 in Omega2 [...]"


We haven't checked in detail yet whether this would affect test cases, but I would certainly 
suggest to add - in case - the example to test cases, at least in for any future version of SPARQL.
We attach the test file for your convenience.

best regards,
Axel & Sebastian

--
Prof. Dr. Axel Polleres
Institute for Information Business, WU Vienna
url: http://www.polleres.net/  twitter: @AxelPolleres

Received on Monday, 19 May 2014 09:24:51 UTC