Re: More on MINUS vs. UNSAID

* Lee Feigenbaum <lee@thefigtrees.net> [2009-07-08 01:13-0400]
> On yesterday's call, we spent a lot of time trying to understand the  
> differences between MINUS and UNSAID.
>
> I'm using this email to try to bring things together for myself, with  
> the help that one or more of the following will happen:
>
> a) it will help someone else similarly
> b) others will point out where i'm wrong
> c) if i understand it i'll be able to help explain it to others
> d) if i understand it i'll be able to formulate an informed opinion
>
> I'll start with MINUS, since we actually spent more time on that in  
> today's call. On the call, we came up with 3 potential definitions of  
> MINUS, which I'll recap here.
>
> In general, MINUS is a binary operation between solution sets. I use A  
> to represent the left-hand side (LHS) solution set of the MINUS operator  
> and B to represent the right-hand side solution set. In all cases, the  
> result of MINUS is always a (improper) subset of A -- the three  
> definitions differ in how to determine whether a particular solution  
> within A is retained or not.
>
> == MINUS-Set ==
>
> This was -- possibly erroneously -- called the SQL definition of MINUS  
> on the call. The idea is that it's a strict set difference operation --  
> a solution is removed from A only if that exact solution (same exact  
> bindings, no more & no less) occurs within B. There may be a slightly  
> looser version of this in which a solution is removed from A if that  
> solution is a subset of a solution in B.
>
> I'm guessing that the cardinality here is preserved - i.e., this would  
> really be multiset difference.

Let's assume that *our* MINUS is multiset, for consistency with our
UNION (which is analogous to SQL's UNION ALL). Most of SQL defaults
to preserving cardinality, apart from UNION and MINUS.

> There was some confusion on the call as to whether this is or is not  
> what SQL does with MINUS (which is confused further, I think, by the  
> fact that SQL acts over positional columns rather than named columns  
> (variables)?). In either case, I don't think I heard *any* support for  
> this definition of MINUS.

I do not support this definition.
More specifically, I support conceptual compatibility with SQL MINUS
modulo adapting it to SPARQL's preservation of cardinality and the
fact that in SPARQL, you can have no common variables in A and B.
Your MINUS-AntiJoin+Restriction choice is thus more applicable.


> == MINUS-AntiJoin ==
>
> This relies on the standard SPARQL join criteria to determine which  
> solutions to remove - i.e. any solution in A that could successfully  
> join with at least one solution B is instead removed from A.
>
> SPARQL defines the notion of "compatible mappings" (see  
> http://www.w3.org/TR/rdf-sparql-query/#BasicGraphPattern) -- two  
> solutions s1 and s2 are compatible if for every variable they share (in  
> their domain of bound variables), that variable is bound to the same  
> value in both s1 and s2. Note that this means that two solutions that  
> share no variables in common are (vacuously) compatible.
>
> SPARQL joins are defined in terms of compatible mappings - the result of  
> Join(A, B) is the merge (effectively union) of all pairs of compatible  
> solutions from A and B.
>
> So this definition of MINUS, then, says that A MINUS B removes from A  
> all of the solutions that are compatible with at least one solution in B.
>
> I'm not sure what MINUS does with the cardinality here...
>
> This definition has one oddity that leads to the third definition. If  
> (?a = "a") is a solution in A and there is a solution in B that doesn't  
> bind ?a, then this solution is removed from A (because of the vacuous  
> compatibility mentioned above). That is, MINUS-AntiJoin removes from A  
> any solution s_A for which there is a solution in B that shares no  
> variables in common s_A. This seems weird, but is a natural consequence  
> of this definition.
>
> I'm not sure if anyone on the call was advocating this definition.
>
> == MINUS-AntiJoin+Restriction ==
>
> This definition is like the previous, but adds the condition that a  
> solution s_A is only removed from A if there is a solution s_ B in B  
> such that s_A and s_B are compatible AND share at least one bound  
> variable in common. That is, to remove a solution from A,  
> MINUS-AntiJoin+Restriction requires there to be a solution in B that is  
> non-vacuously compatible with it.
>
> This seemed to be the definition that most of the proponents of MINUS on  
> the call advocated. Other members of the group were uncomfortable with  
> what seems to them to be an artificial restriction in the definition.
>
>
> OK, with this spelled out, I wanted to look at Eric's treatment in  
> http://lists.w3.org/Archives/Public/public-rdf-dawg/2009JulSep/0022.html. 
>   In particular, looking at this example in Eric's mail:
>
> """
> Query M3:
> ?who foaf:givenname ?name
> MINUS {
>   ?who foaf:holdsAccount ?act
>   OPTIONAL {
>     ?act foaf:accountName ?name .
>   }
> }
> Result M3:
> ?who   ?name    ?who   ?act   ?name    ?who   ?name
> _:eric "eric"   _:eric <act1> "eric"   _:bobS "bob"
> _:bobS "bob"  - _:bobS <act2> "bobS" = _:eve  "eve"
> _:eve  "eve"    _:eve  <act3>	
> """
>
> The only way that I can see that (?who=_:eve, ?name="eve") is retained  
> in the result of the MINUS is if Eric is using the "looser" form of the  
> MINUS-set definition that I give above. Eric, can you confirm this is  
> the case? The two MINUS-AntiJoin* definitions would, I think, eliminate  
> that solution because (?who=_:eve, ?name="eve") is compatible with  
> (?who=_:eve, ?act=<act3>).

I think you've made a leap from "AND share at least one bound
variable" to "have any variables in common". As I understand both
MINUS-AntiJoin and MINUS-AntiJoin+Restriction, both restrict A if
there is a tuple in B with the same values for all the bindings.
Thus, a distinguishing test case for these would be:

?who foaf:givenname ?name
MINUS {
  ?who2 foaf:holdsAccount ?act
  OPTIONAL {
    ?act foaf:accountName ?name2 .
  }
}


> ...I'm going to punt on UNSAID tonight because it's getting late and I  
> don't have an easy way to explain it. Andy explains it as a !EXISTS  
> filter (i.e., solve A and then for each solution in A, filter it against  
> a !EXISTS filter) - but without totally understanding what !EXISTS  
> means, I can't really see how that relates to these MINUS definitions.
>
> Greg (& someone else?) explained UNSAID today as AntiOptional. I'm  
> wondering if "AntiOptional" is actually the same as  
> MINUS-AntiJoin+Restriction? Can anyone tell?
>
> Lee
>

-- 
-eric

office: +1.617.258.5741 32-G528, MIT, Cambridge, MA 02144 USA
mobile: +1.617.599.3509

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

There are subtle nuances encoded in font variation and clever layout
which can only be seen by printing this message on high-clay paper.

Received on Wednesday, 8 July 2009 16:59:14 UTC