- From: Orri Erling <erling@xs4all.nl>
- Date: Thu, 12 Nov 2009 20:08:36 +0100
- To: "'Andy Seaborne'" <andy.seaborne@talis.com>
- Cc: "'SPARQL Working Group'" <public-rdf-dawg@w3.org>
Andy Yes, this was meant for the list. As you say, scalar and exists/not exists subqueries are not absolutely necessary for the expressivity but are still things that SQL has taught people to expect. A sort of corner case is a scalar subquery with a limit 1 combined with order by. Such a one cannot be expressed as a derived table (subq in from)m because of the different scope rules. If the semantics are that all subqueries are evaluable separately and then joinable after this, the meaning of order by + limit is clear enough. But a scalar subquery is evaluable at whatever point all the variables that it references from the outside are bound. Thus the meaning of limit 1 is in relation to the solutions generated for each set of bindings of variables shared between the scalar subquery and the rest of the query. This is more a curiosity than a real item of importance. SStill for the cases where a scalar or exists/not exists subquery is expressible as a derived table, this would at least be a syntactic nicety even though it requires some scope rules of its own. Regards Orri -----Original Message----- From: Andy Seaborne [mailto:afs@talisplatform.com] On Behalf Of Andy Seaborne Sent: Thursday, November 12, 2009 11:26 AM To: Orri Erling Subject: Re: Views on the outcomes of F2F Did you mean to send this to the list? it's content suggests it should go there. (we all press reply on this list!) ------ Commentary : this is offlist and my initial thoughts. The process impact is something I would need to consider some more to be more confident that it's the best approach to bring features to the users. I agree about existence in filters but in your example it can be done by SELECT ?celeb, count (*) where { ?claimant foaf:knows ?celeb . NOT EXISTS{?celeb foaf:knows ?claimant} } group by ?celeb order by desc 2 limit 10 existence in FILTER is needed if there is another condition and it's not conjunctive. eg. another condition on ?claimant: SELECT ?celeb, count (*) where { ?claimant foaf:knows ?celeb . FILTER(NOT EXISTS{?celeb foaf:knows ?claimant} || ?claimant = <#me>) } group by ?celeb order by desc 2 limit 10 As to scalar queries, I agree in principle, and would suggest that the use of algebra substitute() in FPWD gives the right and expected semantics (same as SQL as I understand them). It' not conceptually very hard. I would note that with name variables in SPARQL this can be done by separating out the subquery, with care, and using a named variable to capture the scalar value. # Quick example .... { { SELECT ?claimant, count(?celeb) AS ?C { ?claimant foaf:knows ?celeb . } } # Look for low-contact number claimants FILTER( NOT EXISTS{?celeb foaf:knows ?claimant} || ?C > 50 ) } although it's a little more verbose. My reservation is the time it will take to get the WG to agree on this and it's not in the charter, which is a criteria that the chairs and some WG members worry about. The time matter will disrupt other key features. I might be able to support a phased approach. Some features now, more coming because it's important not to create a long delay to the next set of SPARQL features if looking from a broader perspective than just this WG. Andy On 12/11/2009 08:57, Orri Erling wrote: > > > > > > Hi > > > I think we should include existence subqueries in filters. Likewise > for scalar subqueries although these are not as necessary. SQL has > had both for all time and has been implemented countless times so > there is no particular difficulty in any of this. The scope rules of SQL > are also clear and intuitive and applicable, even though different from the > scope rules of a derived table (subquery in from clause). > > > Consider: > > > select ?celeb, count (*) > where { > ?claimant foaf:knows ?celeb . > filter (!bif:exists ((select (1) where { ?celeb foaf:knows ?claimant > }))) > } group by ?celeb order by desc 2 limit 10 > > > > > select ?celeb, count (*) > where { > ?claimant foaf:knows ?celeb . > filter (bif:exists ((select (1) where { ?celeb foaf:knows ?claimant }))) > } group by ?celeb order by desc 2 limit 10 > > > The first takes the persons whom most claim to know withougt being known in > return. > The second takes the people with the most reciprocally acknowledged knows > relations. > Since the graph is not specified, these are evaluated over the union all of > all gfraphs. > > The first case (antijoin) can be expressed with the optional + bound > idiom. For usable performance this idiom must be recognized by the > processor and compiled as antijoin. How many do this? We don't but > of course could. This is somewhat futile though since there is a more > natural expression. > > The second case (semijoin) can be expressed as > > select ?celeb, count (*) > where { > ?claimant foaf:knows ?celeb . > {select distinct ?celeb2 ?claimant2 where { ?celeb2 foaf:knows > ?claimant2 }} > filter (?celeb = ?celeb2&& ?claimant2 = ?claimant) > } group by ?celeb order by desc 2 limit 10 > > > Of course then the implementation is expected to figure out that a > semijoin is being meant and then compile it as such. With us, the > version with exists runs about 10x faster than the one with the > distinct subquery but they produce the same result. We do not > recognize the distinct subquery to be a semijoin because it would not > occur to us, or to other SQL-minded people to write such a thing. > > Also, the SQL quantified subqueries can be expressed as exists and not > exists. Having to write exists/not exists in such awkward ways makes these > even less readable when expressed in this way. > > We should note that the audience potentially adopting SPARQL know SQL and > look to SPARQL for cases where heterogeneity of the data makes SQL > impractical. > > > Regards > Orri > > > > > > > > ______________________________________________________________________ > This email has been scanned by the MessageLabs Email Security System. > For more information please visit http://www.messagelabs.com/email > ______________________________________________________________________
Received on Thursday, 12 November 2009 19:09:25 UTC