More on MINUS vs. UNSAID from Lee Feigenbaum on 2009-07-08 (public-rdf-dawg@w3.org from July to September 2009)

From: Lee Feigenbaum <lee@thefigtrees.net>
Date: Wed, 08 Jul 2009 01:13:43 -0400
To: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <4A542B07.5030002@thefigtrees.net>
On yesterday's call, we spent a lot of time trying to understand the 
differences between MINUS and UNSAID.

I'm using this email to try to bring things together for myself, with 
the help that one or more of the following will happen:

a) it will help someone else similarly
b) others will point out where i'm wrong
c) if i understand it i'll be able to help explain it to others
d) if i understand it i'll be able to formulate an informed opinion

I'll start with MINUS, since we actually spent more time on that in 
today's call. On the call, we came up with 3 potential definitions of 
MINUS, which I'll recap here.

In general, MINUS is a binary operation between solution sets. I use A 
to represent the left-hand side (LHS) solution set of the MINUS operator 
and B to represent the right-hand side solution set. In all cases, the 
result of MINUS is always a (improper) subset of A -- the three 
definitions differ in how to determine whether a particular solution 
within A is retained or not.

== MINUS-Set ==

This was -- possibly erroneously -- called the SQL definition of MINUS 
on the call. The idea is that it's a strict set difference operation -- 
a solution is removed from A only if that exact solution (same exact 
bindings, no more & no less) occurs within B. There may be a slightly 
looser version of this in which a solution is removed from A if that 
solution is a subset of a solution in B.

I'm guessing that the cardinality here is preserved - i.e., this would 
really be multiset difference.

There was some confusion on the call as to whether this is or is not 
what SQL does with MINUS (which is confused further, I think, by the 
fact that SQL acts over positional columns rather than named columns 
(variables)?). In either case, I don't think I heard *any* support for 
this definition of MINUS.

== MINUS-AntiJoin ==

This relies on the standard SPARQL join criteria to determine which 
solutions to remove - i.e. any solution in A that could successfully 
join with at least one solution B is instead removed from A.

SPARQL defines the notion of "compatible mappings" (see 
http://www.w3.org/TR/rdf-sparql-query/#BasicGraphPattern) -- two 
solutions s1 and s2 are compatible if for every variable they share (in 
their domain of bound variables), that variable is bound to the same 
value in both s1 and s2. Note that this means that two solutions that 
share no variables in common are (vacuously) compatible.

SPARQL joins are defined in terms of compatible mappings - the result of 
Join(A, B) is the merge (effectively union) of all pairs of compatible 
solutions from A and B.

So this definition of MINUS, then, says that A MINUS B removes from A 
all of the solutions that are compatible with at least one solution in B.

I'm not sure what MINUS does with the cardinality here...

This definition has one oddity that leads to the third definition. If 
(?a = "a") is a solution in A and there is a solution in B that doesn't 
bind ?a, then this solution is removed from A (because of the vacuous 
compatibility mentioned above). That is, MINUS-AntiJoin removes from A 
any solution s_A for which there is a solution in B that shares no 
variables in common s_A. This seems weird, but is a natural consequence 
of this definition.

I'm not sure if anyone on the call was advocating this definition.

== MINUS-AntiJoin+Restriction ==

This definition is like the previous, but adds the condition that a 
solution s_A is only removed from A if there is a solution s_ B in B 
such that s_A and s_B are compatible AND share at least one bound 
variable in common. That is, to remove a solution from A, 
MINUS-AntiJoin+Restriction requires there to be a solution in B that is 
non-vacuously compatible with it.

This seemed to be the definition that most of the proponents of MINUS on 
the call advocated. Other members of the group were uncomfortable with 
what seems to them to be an artificial restriction in the definition.


OK, with this spelled out, I wanted to look at Eric's treatment in 
http://lists.w3.org/Archives/Public/public-rdf-dawg/2009JulSep/0022.html. 
   In particular, looking at this example in Eric's mail:

"""
Query M3:
?who foaf:givenname ?name
MINUS {
   ?who foaf:holdsAccount ?act
   OPTIONAL {
     ?act foaf:accountName ?name .
   }
}
Result M3:
?who   ?name    ?who   ?act   ?name    ?who   ?name
_:eric "eric"   _:eric <act1> "eric"   _:bobS "bob"
_:bobS "bob"  - _:bobS <act2> "bobS" = _:eve  "eve"
_:eve  "eve"    _:eve  <act3> 
"""

The only way that I can see that (?who=_:eve, ?name="eve") is retained 
in the result of the MINUS is if Eric is using the "looser" form of the 
MINUS-set definition that I give above. Eric, can you confirm this is 
the case? The two MINUS-AntiJoin* definitions would, I think, eliminate 
that solution because (?who=_:eve, ?name="eve") is compatible with 
(?who=_:eve, ?act=<act3>).



...I'm going to punt on UNSAID tonight because it's getting late and I 
don't have an easy way to explain it. Andy explains it as a !EXISTS 
filter (i.e., solve A and then for each solution in A, filter it against 
a !EXISTS filter) - but without totally understanding what !EXISTS 
means, I can't really see how that relates to these MINUS definitions.

Greg (& someone else?) explained UNSAID today as AntiOptional. I'm 
wondering if "AntiOptional" is actually the same as 
MINUS-AntiJoin+Restriction? Can anyone tell?

Lee
Received on Wednesday, 8 July 2009 05:14:42 UTC