- From: Lee Feigenbaum <lee@thefigtrees.net>
- Date: Wed, 08 Jul 2009 01:13:43 -0400
- To: SPARQL Working Group <public-rdf-dawg@w3.org>
On yesterday's call, we spent a lot of time trying to understand the
differences between MINUS and UNSAID.
I'm using this email to try to bring things together for myself, with
the help that one or more of the following will happen:
a) it will help someone else similarly
b) others will point out where i'm wrong
c) if i understand it i'll be able to help explain it to others
d) if i understand it i'll be able to formulate an informed opinion
I'll start with MINUS, since we actually spent more time on that in
today's call. On the call, we came up with 3 potential definitions of
MINUS, which I'll recap here.
In general, MINUS is a binary operation between solution sets. I use A
to represent the left-hand side (LHS) solution set of the MINUS operator
and B to represent the right-hand side solution set. In all cases, the
result of MINUS is always a (improper) subset of A -- the three
definitions differ in how to determine whether a particular solution
within A is retained or not.
== MINUS-Set ==
This was -- possibly erroneously -- called the SQL definition of MINUS
on the call. The idea is that it's a strict set difference operation --
a solution is removed from A only if that exact solution (same exact
bindings, no more & no less) occurs within B. There may be a slightly
looser version of this in which a solution is removed from A if that
solution is a subset of a solution in B.
I'm guessing that the cardinality here is preserved - i.e., this would
really be multiset difference.
There was some confusion on the call as to whether this is or is not
what SQL does with MINUS (which is confused further, I think, by the
fact that SQL acts over positional columns rather than named columns
(variables)?). In either case, I don't think I heard *any* support for
this definition of MINUS.
== MINUS-AntiJoin ==
This relies on the standard SPARQL join criteria to determine which
solutions to remove - i.e. any solution in A that could successfully
join with at least one solution B is instead removed from A.
SPARQL defines the notion of "compatible mappings" (see
http://www.w3.org/TR/rdf-sparql-query/#BasicGraphPattern) -- two
solutions s1 and s2 are compatible if for every variable they share (in
their domain of bound variables), that variable is bound to the same
value in both s1 and s2. Note that this means that two solutions that
share no variables in common are (vacuously) compatible.
SPARQL joins are defined in terms of compatible mappings - the result of
Join(A, B) is the merge (effectively union) of all pairs of compatible
solutions from A and B.
So this definition of MINUS, then, says that A MINUS B removes from A
all of the solutions that are compatible with at least one solution in B.
I'm not sure what MINUS does with the cardinality here...
This definition has one oddity that leads to the third definition. If
(?a = "a") is a solution in A and there is a solution in B that doesn't
bind ?a, then this solution is removed from A (because of the vacuous
compatibility mentioned above). That is, MINUS-AntiJoin removes from A
any solution s_A for which there is a solution in B that shares no
variables in common s_A. This seems weird, but is a natural consequence
of this definition.
I'm not sure if anyone on the call was advocating this definition.
== MINUS-AntiJoin+Restriction ==
This definition is like the previous, but adds the condition that a
solution s_A is only removed from A if there is a solution s_ B in B
such that s_A and s_B are compatible AND share at least one bound
variable in common. That is, to remove a solution from A,
MINUS-AntiJoin+Restriction requires there to be a solution in B that is
non-vacuously compatible with it.
This seemed to be the definition that most of the proponents of MINUS on
the call advocated. Other members of the group were uncomfortable with
what seems to them to be an artificial restriction in the definition.
OK, with this spelled out, I wanted to look at Eric's treatment in
http://lists.w3.org/Archives/Public/public-rdf-dawg/2009JulSep/0022.html.
In particular, looking at this example in Eric's mail:
"""
Query M3:
?who foaf:givenname ?name
MINUS {
?who foaf:holdsAccount ?act
OPTIONAL {
?act foaf:accountName ?name .
}
}
Result M3:
?who ?name ?who ?act ?name ?who ?name
_:eric "eric" _:eric <act1> "eric" _:bobS "bob"
_:bobS "bob" - _:bobS <act2> "bobS" = _:eve "eve"
_:eve "eve" _:eve <act3>
"""
The only way that I can see that (?who=_:eve, ?name="eve") is retained
in the result of the MINUS is if Eric is using the "looser" form of the
MINUS-set definition that I give above. Eric, can you confirm this is
the case? The two MINUS-AntiJoin* definitions would, I think, eliminate
that solution because (?who=_:eve, ?name="eve") is compatible with
(?who=_:eve, ?act=<act3>).
...I'm going to punt on UNSAID tonight because it's getting late and I
don't have an easy way to explain it. Andy explains it as a !EXISTS
filter (i.e., solve A and then for each solution in A, filter it against
a !EXISTS filter) - but without totally understanding what !EXISTS
means, I can't really see how that relates to these MINUS definitions.
Greg (& someone else?) explained UNSAID today as AntiOptional. I'm
wondering if "AntiOptional" is actually the same as
MINUS-AntiJoin+Restriction? Can anyone tell?
Lee
Received on Wednesday, 8 July 2009 05:14:42 UTC