Re: A comment about the semantics of the Minus operator from the SPARQL specification from Andy Seaborne on 2014-07-20 (public-rdf-dawg-comments@w3.org from July 2014)

From: Andy Seaborne <andy@apache.org>
Date: Sun, 20 Jul 2014 15:47:55 +0100
To: public-rdf-dawg-comments@w3.org
Message-ID: <53CBD69B.1060608@apache.org>
((Another unofficial reply))

Hi Boris,

Thank you for the comment and the clear details.

Did you look at the "diff" operator in the algebra (without the 
expression part)? (it was in SPARQL 1.0)

    (A OPT B) = (A JOIN B) UNION (A DIFF B)

MINUS was designed by considering use cases (it's a small world - at a 
face-to-face with one end of the video link in cs.ox.ac.uk!) and is a 
user written operation.  DIFF is more internal and I think has the 
feature you are looking for.  It does not have the domain codition.

For MINUS, if you remove the "dom(μ) and dom(μ') are disjoint" condition 
then it's not just "pattern MINUS' {}" that is empty.

    { ?X rdf:type :A . MINUS' { ?Y rdf:type :B } }

is empty because any solution of {?X rdf:type :A} is compatible with { 
?Y rdf:type :B } (no common variables to compare so it's compatible).

In fact, it would take only one solution mapping in the RHS to have no 
variables in common with LHS and the MINUS result is the empty multiset 
regardless of the rest of the RHS.

The analogy to SQL needs to treated with one element of care - in SQL, 
there are column compatibility rules where MINUS and UNION operate on 
column compatible results from inner SELECTs.

 Andy

PS see also:
http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2014May/0000.html

On 19/07/14 11:37, Boris Motik wrote:
> Hello,
>
> Thanks for this answer. I understand the point you make; my point is
> that the choice of the semantics for Minus is suboptimal and that it may
> hinder efficient implementation.
>
> Regards,
>
> Boris
>
>
>
> -------- Original message --------
> From: Axel Polleres <axel@polleres.net>
> Date:
> To: Boris Motik <boris.motik@cs.ox.ac.uk>
> Cc: public-rdf-dawg-comments@w3.org
> Subject: Re: A comment about the semantics of the Minus operator from
> the SPARQL specification
>
>
> Dear Boris,
>
> (note that this is not an official group answer)... just quick:
>
>> (1)  { ?X rdf:type :A . OPTIONAL { ?Y rdf:type :B } }
>> (2)  { { ?X rdf:type :A . ?Y rdf:type :B } UNION { ?X rdf:type :A . MINUS { ?Y rdf:type :B } } }
>
> Indeed, the semantics of OPTIONAL is defined by the LeftJoin algebra
> operator, cf.
> http://www.w3.org/TR/sparql11-query/#defn_algLeftJoin
> rather than by the combination of UNION and MINUS... in this context,
> see also the section on the relation between NOT EXISTS and MINUS
> http://www.w3.org/TR/sparql11-query/#neg-notexists-minus
> ... they are not always equivalent either.
>
> HTH,
> Axel
>
>
> --
> Prof. Dr. Axel Polleres
> Institute for Information Business, WU Vienna
> url: http://www.polleres.net/  twitter: @AxelPolleres
>
> On 18 Jul 2014, at 17:07, Boris Motik <boris.motik@cs.ox.ac.uk> wrote:
>
>> Hello,
>>
>> I have a comment about the semantics of the Minus operator from the SPARQL specification. I stumbled upon this while trying to implement SPARQL 1.1 in the RDFox system that we’re developing at Oxford University. In Section 18.5 (http://www.w3.org/TR/sparql11-query/#sparqlAlgebra) of the SPARQL 1.1
> recommendation, I see that the Minus operator is defined as follows:
>>
>> Minus(Ω1, Ω2) = { μ | μ in Ω1 . ∀ μ' in Ω2, either μ and μ' are not compatible or dom(μ) and dom(μ') are disjoint }
>>
>> I see an explanation below the operator’s definition for the second part of the condition (i.e., "or dom(μ) and dom(μ') are disjoint”), but this seems to me as a suboptimal choice for at least two reasons.
>>
>>
>>
>> 1. The addition of this condition breaks means that the following two patterns don’t have the same result:
>>
>> (1)  { ?X rdf:type :A . OPTIONAL { ?Y rdf:type :B } }
>> (2)  { { ?X rdf:type :A . ?Y rdf:type :B } UNION { ?X rdf:type :A . MINUS { ?Y rdf:type :B } } }
>>
>> The equivalence (A OPT B) = (A JOIN B) UNION (A MINUS B) seems natural and it holds already in SQL, so it may be unexpected for SPARQL to break this. Furthermore, query planning often depends on this equivalence, so the modified semantics of MINUS prevents  the application of some well-known optimisation.
>>
>>
>>
>> 2. The present definition prevents an implementation approach that uses sideways information passing and iterators. To clarify this, consider the following pattern P, where SP1 and SP2 are some (unspecified) patterns:
>>
>> (3)  P = { SP1 MINUS { SP2 } }
>>
>> Now assume that we’ve got a function eval() that takes as input a pattern and that returns a set of mappings that constitute answers to the pattern. One way to implement eval(SP1 MINUS { SP2 }) is as follows:
>>
>> R := empty set
>> For each mapping μ in eval(SP1)
>>    Let SP2’ be the result of applying μ to SP2 (i.e., we replace in SP2 each variable x with μ(x))
>>    If eval(SP2’) is empty then
>>        Add μ to R
>> return R
>>
>> Evaluating the query in such a way might be better than using the usual bottom-up strategy. For example, if the size of eval(SP1) is small, whereas the size of eval(SP2) is very large but SP2 can be evaluated using indexes, then the above algorithm is much  more efficient than the bottom-up one as it iterates over a small set
> and doesn’t require materialisation of a potentially very large set.
> Another benefit of this implementation strategy is that, depending on
> the structure of SP1 and SP2, we may not need to materialise the answers
> to either SP1 or SP2: if we use an iterator-based implementation (as is
> common in database systems), we may be able to evaluate SP1 on-the-fly,
> and we may also be able to check whether eval(SP2’) is empty on the fly
> as well.
>>
>> Unfortunately, the algorithm I sketched is incorrect with respect to the semantics given in the document, and I see no way to make it correct: after applying μ to SP2, there is no way of checking the constraint "or dom(μ) and dom(μ') are disjoint” from the  definition of Minus. Thus, if one is to fully obey the semantics from
> the specification, it seems to me that one is forced to fully evaluate
> SP2 so that one can later check the domain disjointness condition.
>>
>>
>>
>> I understand that dropping the disjointness condition means that pattern { SP1 MINUS { } } has no answers regardless of the form of SP1. However, given the expected identities between JOIN, MINUS, and OPTIONAL, one might actually expect this; furthermore,  I doubt that there would be many cases in practice where this effect
> would be observable; and finally I believe that the resulting
> specification would lend itself to better implementation. I understand
> that it might be too late to change the specification now, and I am
> writing this e-mail for the following reasons:
>>
>> - I would like to log a technical comment that might be taken into account in a future edition of the specification.
>> - I am wondering whether WG members have anything to add to my comment.
>> - I am wondering whether it might be possible to add somewhere (e.g., in the errata document) a comment that would allow SPARQL vendors to use the modified semantics of MINUS, but with a lesser degree of compatibility with the specification.
>>
>> Regards,
>>
>> Boris
>>
>>
>
Received on Sunday, 20 July 2014 14:48:25 UTC