RE: Views on the outcomes of F2F from Orri Erling on 2009-11-12 (public-rdf-dawg@w3.org from October to December 2009)

From: Orri Erling <erling@xs4all.nl>
Date: Thu, 12 Nov 2009 20:08:36 +0100
To: "'Andy Seaborne'" <andy.seaborne@talis.com>
Cc: "'SPARQL Working Group'" <public-rdf-dawg@w3.org>
Message-Id: <200911121908.nACJ8orm036233@smtp-vbr3.xs4all.nl>
Andy

Yes, this was meant for the list.  As you say, scalar and exists/not exists
subqueries are not absolutely necessary for the expressivity but are still
things that SQL has taught people to expect.

A sort of corner case is a scalar subquery with a limit 1 combined with
order by.  Such a one cannot be expressed as a derived table (subq in from)m
because of the different scope rules.  If the semantics are that all
subqueries are evaluable separately and then joinable after this, the
meaning of order by + limit is clear enough.  But a scalar subquery is
evaluable at whatever point all the variables that it references from the
outside are bound.  Thus the meaning of limit 1 is in relation to the
solutions generated for each set of bindings of variables shared between the
scalar subquery and the rest of the query.  

This is more a curiosity than a real item of importance.  

SStill for the cases where a scalar or exists/not exists subquery is
expressible as a derived table, this would at least be a syntactic nicety
even though it requires some scope rules of its own.
 





Regards

Orri


-----Original Message-----
From: Andy Seaborne [mailto:afs@talisplatform.com] On Behalf Of Andy
Seaborne
Sent: Thursday, November 12, 2009 11:26 AM
To: Orri Erling
Subject: Re: Views on the outcomes of F2F


Did you mean to send this to the list?

it's content suggests it should go there. (we all press reply on this list!)

------

Commentary : this is offlist and my initial thoughts.  The process 
impact is something I would need to consider some more to be more 
confident that it's the best approach to bring features to the users.


I agree about existence in filters
but in your example it can be done by

SELECT ?celeb, count (*)
where {
      ?claimant foaf:knows ?celeb .
      NOT EXISTS{?celeb foaf:knows ?claimant}
     } group by ?celeb order by desc 2 limit 10


existence in FILTER is needed if there is another condition and it's not 
conjunctive.
eg. another condition on ?claimant:

SELECT ?celeb, count (*)
where {
    ?claimant foaf:knows ?celeb .
    FILTER(NOT EXISTS{?celeb foaf:knows ?claimant} || ?claimant = <#me>)
   }
   group by ?celeb order by desc 2 limit 10


As to scalar queries, I agree in principle, and would suggest that the 
use of algebra substitute() in FPWD gives the right and expected 
semantics (same as SQL as I understand them).  It' not conceptually very 
hard.

I would note that with name variables in SPARQL this can be done by 
separating out the subquery, with care, and using a named variable to 
capture the scalar value.

# Quick example ....
{
   { SELECT ?claimant, count(?celeb) AS ?C
     { ?claimant foaf:knows ?celeb . }
   }
   # Look for low-contact number claimants
   FILTER( NOT EXISTS{?celeb foaf:knows ?claimant}  || ?C > 50 )
}

although it's a little more verbose.


My reservation is the time it will take to get the WG to agree on this 
and it's not in the charter, which is a criteria that the chairs and 
some WG members worry about.  The time matter will disrupt other key 
features.

I might be able to support a phased approach.  Some features now, more 
coming because it's important not to create a long delay to the next set 
of SPARQL features if looking from a broader perspective than just this WG.

	Andy



On 12/11/2009 08:57, Orri Erling wrote:
>
>
>
>
>
> Hi
>
>
> I think we should include existence subqueries in filters.  Likewise
> for scalar subqueries although these are not as necessary.  SQL has
> had both for all time and has been implemented countless times so
> there is no particular difficulty in any of this.  The scope rules of SQL
> are also clear and intuitive and applicable, even though different from
the
> scope rules of a derived table (subquery in from clause).
>
>
> Consider:
>
>
> select ?celeb, count (*)
> where {
>      ?claimant foaf:knows ?celeb .
>      filter (!bif:exists ((select (1) where { ?celeb foaf:knows ?claimant
> })))
>    } group by ?celeb order by desc 2 limit 10
>
>
>
>
>   select ?celeb, count (*)
> where {
>      ?claimant foaf:knows ?celeb .
>      filter (bif:exists ((select (1) where { ?celeb foaf:knows ?claimant
})))
>    } group by ?celeb order by desc 2 limit 10
>
>
> The first takes the persons whom most claim to know withougt being known
in
> return.
> The second takes the people with the most reciprocally acknowledged knows
> relations.
> Since the graph is not specified, these are evaluated over the union all
of
> all gfraphs.
>
> The first case (antijoin) can be expressed with the optional + bound
> idiom.  For usable performance this idiom must be recognized by the
> processor and compiled as antijoin.  How many do this?  We don't but
> of course could.  This is somewhat futile though since there is a more
> natural expression.
>
> The second case (semijoin) can be expressed as
>
> select ?celeb, count (*)
> where {
>      ?claimant foaf:knows ?celeb .
>      {select distinct ?celeb2 ?claimant2  where { ?celeb2 foaf:knows
> ?claimant2 }}
>      filter (?celeb = ?celeb2&&  ?claimant2 = ?claimant)
>    } group by ?celeb order by desc 2 limit 10
>
>
> Of course then the implementation is expected to figure out that a
> semijoin is being meant and then compile it as such.  With us, the
> version with exists runs about 10x faster than the one with the
> distinct subquery but they produce the same result.  We do not
> recognize the distinct subquery to be a semijoin because it would not
> occur to us, or to other SQL-minded people to write such a thing.
>
> Also, the SQL quantified subqueries can be expressed as exists and not
> exists.  Having to write exists/not exists in such awkward ways makes
these
> even less readable when expressed in this way.
>
> We should note that the audience potentially adopting SPARQL know SQL and
> look to SPARQL for cases where heterogeneity of the data makes SQL
> impractical.
>
>
> Regards
> Orri
>
>
>
>
>
>
>
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________
Received on Thursday, 12 November 2009 19:09:25 UTC