- From: Lee Feigenbaum <lee@thefigtrees.net>
- Date: Wed, 08 Jul 2009 01:13:43 -0400
- To: SPARQL Working Group <public-rdf-dawg@w3.org>
On yesterday's call, we spent a lot of time trying to understand the differences between MINUS and UNSAID. I'm using this email to try to bring things together for myself, with the help that one or more of the following will happen: a) it will help someone else similarly b) others will point out where i'm wrong c) if i understand it i'll be able to help explain it to others d) if i understand it i'll be able to formulate an informed opinion I'll start with MINUS, since we actually spent more time on that in today's call. On the call, we came up with 3 potential definitions of MINUS, which I'll recap here. In general, MINUS is a binary operation between solution sets. I use A to represent the left-hand side (LHS) solution set of the MINUS operator and B to represent the right-hand side solution set. In all cases, the result of MINUS is always a (improper) subset of A -- the three definitions differ in how to determine whether a particular solution within A is retained or not. == MINUS-Set == This was -- possibly erroneously -- called the SQL definition of MINUS on the call. The idea is that it's a strict set difference operation -- a solution is removed from A only if that exact solution (same exact bindings, no more & no less) occurs within B. There may be a slightly looser version of this in which a solution is removed from A if that solution is a subset of a solution in B. I'm guessing that the cardinality here is preserved - i.e., this would really be multiset difference. There was some confusion on the call as to whether this is or is not what SQL does with MINUS (which is confused further, I think, by the fact that SQL acts over positional columns rather than named columns (variables)?). In either case, I don't think I heard *any* support for this definition of MINUS. == MINUS-AntiJoin == This relies on the standard SPARQL join criteria to determine which solutions to remove - i.e. any solution in A that could successfully join with at least one solution B is instead removed from A. SPARQL defines the notion of "compatible mappings" (see http://www.w3.org/TR/rdf-sparql-query/#BasicGraphPattern) -- two solutions s1 and s2 are compatible if for every variable they share (in their domain of bound variables), that variable is bound to the same value in both s1 and s2. Note that this means that two solutions that share no variables in common are (vacuously) compatible. SPARQL joins are defined in terms of compatible mappings - the result of Join(A, B) is the merge (effectively union) of all pairs of compatible solutions from A and B. So this definition of MINUS, then, says that A MINUS B removes from A all of the solutions that are compatible with at least one solution in B. I'm not sure what MINUS does with the cardinality here... This definition has one oddity that leads to the third definition. If (?a = "a") is a solution in A and there is a solution in B that doesn't bind ?a, then this solution is removed from A (because of the vacuous compatibility mentioned above). That is, MINUS-AntiJoin removes from A any solution s_A for which there is a solution in B that shares no variables in common s_A. This seems weird, but is a natural consequence of this definition. I'm not sure if anyone on the call was advocating this definition. == MINUS-AntiJoin+Restriction == This definition is like the previous, but adds the condition that a solution s_A is only removed from A if there is a solution s_ B in B such that s_A and s_B are compatible AND share at least one bound variable in common. That is, to remove a solution from A, MINUS-AntiJoin+Restriction requires there to be a solution in B that is non-vacuously compatible with it. This seemed to be the definition that most of the proponents of MINUS on the call advocated. Other members of the group were uncomfortable with what seems to them to be an artificial restriction in the definition. OK, with this spelled out, I wanted to look at Eric's treatment in http://lists.w3.org/Archives/Public/public-rdf-dawg/2009JulSep/0022.html. In particular, looking at this example in Eric's mail: """ Query M3: ?who foaf:givenname ?name MINUS { ?who foaf:holdsAccount ?act OPTIONAL { ?act foaf:accountName ?name . } } Result M3: ?who ?name ?who ?act ?name ?who ?name _:eric "eric" _:eric <act1> "eric" _:bobS "bob" _:bobS "bob" - _:bobS <act2> "bobS" = _:eve "eve" _:eve "eve" _:eve <act3> """ The only way that I can see that (?who=_:eve, ?name="eve") is retained in the result of the MINUS is if Eric is using the "looser" form of the MINUS-set definition that I give above. Eric, can you confirm this is the case? The two MINUS-AntiJoin* definitions would, I think, eliminate that solution because (?who=_:eve, ?name="eve") is compatible with (?who=_:eve, ?act=<act3>). ...I'm going to punt on UNSAID tonight because it's getting late and I don't have an easy way to explain it. Andy explains it as a !EXISTS filter (i.e., solve A and then for each solution in A, filter it against a !EXISTS filter) - but without totally understanding what !EXISTS means, I can't really see how that relates to these MINUS definitions. Greg (& someone else?) explained UNSAID today as AntiOptional. I'm wondering if "AntiOptional" is actually the same as MINUS-AntiJoin+Restriction? Can anyone tell? Lee
Received on Wednesday, 8 July 2009 05:14:42 UTC