- From: Andy Seaborne <andy.seaborne@talis.com>
- Date: Wed, 03 Feb 2010 16:00:45 +0000
- To: Steve Harris <steve.harris@garlik.com>
- CC: Lee Feigenbaum <lee@thefigtrees.net>, SPARQL Working Group <public-rdf-dawg@w3.org>
On 03/02/2010 3:38 PM, Steve Harris wrote: > On 3 Feb 2010, at 14:53, Andy Seaborne wrote: >> >> On 03/02/2010 1:42 PM, Lee Feigenbaum wrote: >>> >>>> ISSUE-29 >>>> Should negation be done via a binary operator on subqueries, a binary >>>> operator within graph patterns, or a filter+subquery? ?? >>>> This is MINUS / NOT EXISTS issue. >>> >>> This might be a good place to start with a test case or three. Andy, do >>> you happen to have any cases that illustrate the difference here? >>> >>> (I know that personally, my preference here is pretty much "no >>> preference".) >> >> You gathered: >> http://lists.w3.org/Archives/Public/public-rdf-dawg/2009JulSep/0030.html >> >> >> http://lists.w3.org/Archives/Public/public-rdf-dawg/2009JulSep/0137.html >> >> shows that one can be used to implement the other in the majority of >> cases (it's the nested optionals case we had in SPARQL 1.0 in a >> different guise) but we don't have a definition of MINUS to accurately >> compare. What's the cardinality in the results? >> >> So it's not about implementation as I understand the problem space today. >> >> I find it clearer to read and explain the EXISTS form - someone >> (Simon?) said he'd asked the same of an SQL expert and EXISTS is >> better understood by application writers than MINUS there. > > Wasn't that more to do with the syntax, than the semantics? I don't > think anyone is suggesting a different syntax (from EXISTS) for minus. Not as I remember the discussion - it was about the effect of the operators (i.e. semantics). > >> The additional condition in MINUS ("dom(μ) and dom(μ') are disjoint") >> concerns me because I don't know what's being "MINUSed" - it's not a >> regular set/multiset operation and certainly not like SQL MINUS which >> is set difference. > > Cannot the definitions be made compatible? Either by making the corner > cases where results differ errors, or by tweaking the definition of the > minus operator? > > I'm strongly opposed to having something that looks like a binding set > operator really be a macro for a bizarre kind of filter, or whatever it is. NOT EXISTS is very close to OPTIONAL/filter(!bound). It differs in that OPTIONAL/!bound can introduce artificial duplicates. And additional unnecessary work. > Example of why it matters: > > Take: A NOT EXISTS { B } > > If NOT EXISTS is a binding set operator, I can bind A and B, and apply > the logical operation across them (whatever it is) using whatever > strategy is appropriate. Local, distributed, whatever. > > With the Filter definition I have to bind A, and iterate through the > rows, checking the pattern in B rejecting rows that match in B. On > clustered and distributed systems (which obviously I care about) this is > very inefficient. > > Now, in some cases it will be possible to tell by inspection that the > Filter defn. has an equivalent set operation, and perform that set > operation instead as an optimisation, but it seems like bad decision to > require that inspection for no obvious reason. It's most cases that are equivalent (for example, all BGP+Filter in RHS can be done) which si why I said it's not an implementation issue as I see it. I can see how to do the NOT EXISTS with parallelism based on streaming in a cluster where I can't for MINUS because MINUS says "for all μ' in Ω2". That's a characteristic of my implementation - I don't have fully materialized intermediate results at all times (although I hope to find time to make it more sophisticated and switch strategies based on expected intermediate counts). Andy > > - Steve >
Received on Wednesday, 3 February 2010 16:01:14 UTC