W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > January to March 2010

Re: ISSUE-29: MINUS / NOT EXISTS

From: Steve Harris <steve.harris@garlik.com>
Date: Wed, 3 Feb 2010 16:13:35 +0000
Cc: Lee Feigenbaum <lee@thefigtrees.net>, SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <5575AA7F-2AF2-4B6A-ACFA-B34592511DD6@garlik.com>
To: Andy Seaborne <andy.seaborne@talis.com>
On 3 Feb 2010, at 16:00, Andy Seaborne wrote:
...
>> Example of why it matters:
>>
>> Take: A NOT EXISTS { B }
>>
>> If NOT EXISTS is a binding set operator, I can bind A and B, and  
>> apply
>> the logical operation across them (whatever it is) using whatever
>> strategy is appropriate. Local, distributed, whatever.
>>
>> With the Filter definition I have to bind A, and iterate through the
>> rows, checking the pattern in B rejecting rows that match in B. On
>> clustered and distributed systems (which obviously I care about)  
>> this is
>> very inefficient.
> >
>> Now, in some cases it will be possible to tell by inspection that the
>> Filter defn. has an equivalent set operation, and perform that set
>> operation instead as an optimisation, but it seems like bad  
>> decision to
>> require that inspection for no obvious reason.
>
> It's most cases that are equivalent (for example, all BGP+Filter in  
> RHS can be done) which si why I said it's not an implementation  
> issue as I see it.

So, why not limit the RHS to BGP + Filter? I guess UNION is marginally  
useful, but I'd expect some user surprise as to the results if unbound  
values appear in the RHS, so it's questionable.

It is an implementation issue for me. I'd have to implement it both as  
a logic operator, and as an iterating FILTER operation, which will  
only be used rarely.

> I can see how to do the NOT EXISTS with parallelism based on  
> streaming in a cluster where I can't for MINUS because MINUS says  
> "for all μ' in Ω2". That's a characteristic of my implementation -  
> I don't have fully materialized intermediate results at all times  
> (although I hope to find time to make it more sophisticated and  
> switch strategies based on expected intermediate counts).

For me, that's the problem, not the solution. It depends on what  
you're trying to achieve I guess.

- Steve

-- 
Steve Harris, Garlik Limited
2 Sheen Road, Richmond, TW9 1AE, UK
+44 20 8973 2465  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10  
9AD
Received on Wednesday, 3 February 2010 16:14:05 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:41 GMT