Re: ISSUE-29: MINUS / NOT EXISTS

On 3 Feb 2010, at 14:53, Andy Seaborne wrote:
>
> On 03/02/2010 1:42 PM, Lee Feigenbaum wrote:
>>
>>> ISSUE-29
>>> Should negation be done via a binary operator on subqueries, a  
>>> binary
>>> operator within graph patterns, or a filter+subquery?      ??
>>>  This is MINUS / NOT EXISTS issue.
>>
>> This might be a good place to start with a test case or three.  
>> Andy, do
>> you happen to have any cases that illustrate the difference here?
>>
>> (I know that personally, my preference here is pretty much "no
>> preference".)
>
> You gathered:
> http://lists.w3.org/Archives/Public/public-rdf-dawg/2009JulSep/0030.html
>
>
> http://lists.w3.org/Archives/Public/public-rdf-dawg/2009JulSep/0137.html
>
> shows that one can be used to implement the other in the majority of  
> cases (it's the nested optionals case we had in SPARQL 1.0 in a  
> different guise) but we don't have a definition of MINUS to  
> accurately compare.  What's the cardinality in the results?
>
> So it's not about implementation as I understand the problem space  
> today.
>
> I find it clearer to read and explain the EXISTS form - someone  
> (Simon?) said he'd asked the same of an SQL expert and EXISTS is  
> better understood by application writers than MINUS there.

Wasn't that more to do with the syntax, than the semantics? I don't  
think anyone is suggesting a different syntax (from EXISTS) for minus.

> The additional condition in MINUS ("dom(μ) and dom(μ') are  
> disjoint") concerns me because I don't know what's being "MINUSed" -  
> it's not a regular set/multiset operation and certainly not like SQL  
> MINUS which is set difference.

Cannot the definitions be made compatible? Either by making the corner  
cases where results differ errors, or by tweaking the definition of  
the minus operator?

I'm strongly opposed to having something that looks like a binding set  
operator really be a macro for a bizarre kind of filter, or whatever  
it is.

Example of why it matters:

Take: A NOT EXISTS { B }

If NOT EXISTS is a binding set operator, I can bind A and B, and apply  
the logical operation across them (whatever it is) using whatever  
strategy is appropriate. Local, distributed, whatever.

With the Filter definition I have to bind A, and iterate through the  
rows, checking the pattern in B rejecting rows that match in B. On  
clustered and distributed systems (which obviously I care about) this  
is very inefficient.

Now, in some cases it will be possible to tell by inspection that the  
Filter defn. has an equivalent set operation, and perform that set  
operation instead as an optimisation, but it seems like bad decision  
to require that inspection for no obvious reason.

- Steve

-- 
Steve Harris, Garlik Limited
2 Sheen Road, Richmond, TW9 1AE, UK
+44 20 8973 2465  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10  
9AD

Received on Wednesday, 3 February 2010 15:39:06 UTC