Re: DISTINCT with aggregates from Andy Seaborne on 2009-11-13 (public-rdf-dawg@w3.org from October to December 2009)

From: Andy Seaborne <andy.seaborne@talis.com>
Date: Fri, 13 Nov 2009 15:10:14 +0000
To: Lee Feigenbaum <lee@thefigtrees.net>
CC: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <4AFD76D6.40104@talis.com>

On 13/11/2009 14:21, Lee Feigenbaum wrote:
<snip/>

> Thanks, Andy. I hadn't understood previously that this is a
> grammar/syntax issue. In Open Anzo, I check this in code while parsing
> and throw an exception if DISTINCT is used with a non-aggregate function.

ARQ does it as parsing without knowing if it's an aggregate but it's 
doing some context-sensitive parsing.  There is a Java flag set as to 
whether aggregates are allowed via this path to the expression rules.

It means the parser can reuse the whole expression rules and still flag 
parser errors on aggregates out of place.  This isn't expressible in BNF 
without duplicating the two expression production trees and they are 
quite long.

We could go it that way in the spec as well.

There are only certain fixed places (a partial argument for HAVING over 
FILTER - not that the position of the FILTER does not tell you whether 
aggregates are legal, but it indicates to the query writer whether they 
are).

   SELECT (count(*))
   { ... }
   GROUP BY ...
   HAVING(count(*) > 0)

   SELECT ?x
   { ... }
   GROUP BY ...
   HAVING(count(*) > 0)

[I recall mention of discussing this at F2F2 but I cant find it at the 
moment so maybe there wasn't.]

Because you may wish to have a test on the aggrate value but not expose 
the aggregate value resorting to a named var in the HAVING means you 
need two SELECTs.

> I guess this is sort of related to the question of a whether we want a
> keyword to introduce custom aggregate functions: how important is it to
> minimize the number of invalid queries that are syntactically valid?
>
> I believe in SPARQL 1.0 the only such query involves bnode labels
> spanning BGPs?

Yes, I think so.

In 1.1 we have some more cases - aggregates are context sensitive but 
also the scope of AS variables also mentioned inside the graph pattern 
of their SELECT (decision depending).

     Andy

>
> lee

Received on Friday, 13 November 2009 15:10:32 UTC