- From: Lee Feigenbaum <lee@thefigtrees.net>
- Date: Fri, 13 Nov 2009 09:21:49 -0500
- To: Andy Seaborne <andy.seaborne@talis.com>
- CC: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Andy Seaborne wrote: > > > On 13/11/2009 04:41, Lee Feigenbaum wrote: >> Andy Seaborne wrote: >>> > ** DISTINCT in aggregate functions >>> > >>> > Consensus on allowing DISTINCT with multiple arguments to aggregate >>> > functions. DISTINCT in this case passes just the DISTINCT tuples >>> > into the aggregate function (for each group). >>> >>> I'm unclear why it should be allowed in SUM or AVG. Is there a use case? >> >> I asked at the F2F if there were use cases for SUM(DISTINCT ...) and was >> told there were, but don't remember anyone going into the details. >> >> Motivation was overall consistency and consistency with SQL. I also >> described Open Anzo's implementation which does this (allow DISTINCT on >> any aggregate function's arguments, including custom aggregates). What's >> the benefit of not allowing it? >> >>> We are already handling * differently by aggregate and DISNTINCT seems >>> to only really man anything there. Are there specific motivating use >>> cases? >> >> DISTINCT has meaning in plenty of aggregates (e.g. GROUP_CONCAT which >> may or may not end up as a built-in aggregate - still an open issue). >> >>> Is DISTINCT allowed in custom aggregates ? If so, they have different >>> syntax. >> >> The F2F consensus was that DISTINCT is indeed allowed in custom >> aggregates. >> >>> I propose that DISTINCT is not allowed for custom aggregates. An >>> aggregate can choose to do that operation as part of it's definition >>> but DISTINCT and not-DISTINCT forms are two different URI to name the >>> aggregate. >> >> I am personally not in favor of this proposal. What's the benefit of >> preventing DISTINCT? > > >> If I needed to have two different URIs for DISTINCT >> and non-DISTINCT versions of custom aggregates, I'd end up inventing a >> new mechanism to describe aggregates to link together distinct and >> non-distinct versions via something in the service description so that I >> can use a user interface to choose an aggregate and whether it acts over >> distinct tuples or not. >> >> But mainly, I don't understand the motivation for prohibiting DISTINCT >> in these cases, given that it has a very clear semantics (distinct'ing >> the solution sets passed into the aggregate) and use cases (such as >> group_concat). > > It's a practical consideration of syntax: maybe someone has a proposal > here but it seems to me that if custom aggregate syntax is function > syntax, there is nowhere to put the word "DISTINCT". But if function > syntax changes to allow DISTINCT, it allows it in all places. > > I can't see how a new custom aggregate production in the grammar based > on lookahead for DISTINCT in the expression clause would work because it > needs to be valid for absence of DISTINCT as well and that is > syntactically ambiguous as it is a custom function call. > > Custom syntax makes it easier, but not nice looking IMHO: > > AGG(URI, [DISTINCT,] exprList) > > and we also have the related matter of whether we have to have two sets > of expression rules, one which allows aggregates and one that doesn't. Thanks, Andy. I hadn't understood previously that this is a grammar/syntax issue. In Open Anzo, I check this in code while parsing and throw an exception if DISTINCT is used with a non-aggregate function. I guess this is sort of related to the question of a whether we want a keyword to introduce custom aggregate functions: how important is it to minimize the number of invalid queries that are syntactically valid? I believe in SPARQL 1.0 the only such query involves bnode labels spanning BGPs? lee > Andy > >> >> Lee >> >> ______________________________________________________________________ >> This email has been scanned by the MessageLabs Email Security System. >> For more information please visit http://www.messagelabs.com/email >> ______________________________________________________________________ >
Received on Friday, 13 November 2009 14:22:33 UTC