Re: on the alternative aggregate proposal (ED-1)... from Lee Feigenbaum on 2010-02-16 (public-rdf-dawg@w3.org from January to March 2010)

From: Lee Feigenbaum <lee@thefigtrees.net>
Date: Tue, 16 Feb 2010 11:41:49 -0500
To: Axel Polleres <axel.polleres@deri.org>
CC: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <4B7ACACD.8040500@thefigtrees.net>
If I understand the proposal correctly, it only ever adds bindings to a 
result set. It does not ever collapse result sets the way a standard SQL 
aggregate semantics do. This would mean that getting the COUNT of 1 
million triples would still return a result set with 1 million rows. If 
I understand this correctly, it's a non-starter for most use cases of 
aggregates that I have. I'd far rather issue multiple queries that 
return a small number of (aggregated) responses.

Lee

On 2/16/2010 9:51 AM, Axel Polleres wrote:
> browsing through Emanuele's proposal... Please forgive that I just quickly wrote this up without a lot of
> in-depth thinking yet... just to kick-off discussion...
>
>
> Firstly, I have some open questions on the proposal we might want to ask them...
>
> 1)
>
> If I get it right, AGGREGATE {var  FUNCTION vars )
>
> i) projects first groups wrt variables appearing in vars and then
> ii)  evaluates the aggregate on the those groups ...
>
> That may make sense for count, but how does that work for
> min/max, i.e. where is the projection ?
>
>     hmmm... actually it seems the grammar given for FUNCTION is wrong...
>     it should be
>
>      Function | Function '(' var ')'
>    or
>      Function | Function '(' vars ')'
>
> we may want to ask back for clarification here...
>
> ... ok, but let's assume that this means that they have just
>
> SELECT ... var
> where
> AGGREGATE ( var function vars ) FILTER filter
>
> =more-or-less=
>
> SELECT function as var
> where
> GROUP BY vars
> HAVING filter
>
> with a slightly different implicit grouping than we have at the moment?
> There claimed advantage seems to be that they allow to do different aggregations *at once*
> which seems to have some merits, since we can probably only do this with some cumbersome subqueries at the moment.
>
> 2) I don't get entirely get their examples though... e.g.
>
>
> SELECT ?name ?surname ?book ?numberOfBooks ?averageNumberOfBooks
> WHERE {
>      ?auth :name ?name .
>      ?auth :surname ?surname .
>      ?auth :wrote ?book .
>      ?auth :affiliated ?organization .
> }
> AGGREGATE { (?numberOfBooks, COUNT, {?auth} ) FILTER (?numberOfBooks>  5) }
> AGGREGATE { (?affiliationBooks, SUM(?numberOfBooks), {?organization} )
> FILTER (?affiliationBooks>  50)}
>
> here, ?affiliationBooks violates the constraint they pose before:
>
> "In the C-SPARQL language all the variables used in AGGREGATE clauses
> must appear also in the SELECT clause, since aggregation happens after
> standard SPARQL query evaluation. In SPARQL 1.1 the constraint is not
> specified."
>
> I assume they probably  meant to say the constraint only to apply to  the variables
> mentioned in the function and group part? we may want to ask that back for clarification as well
>
> Also, I'd like to see the result table for this.
>
>
> 3) It seems that their SPARQL1.1 formulation attempt of this one
> has some errors...
>
> SELECT ?topic ?numberOfSwissAuthors ?numberOfItalianAuthors
> WHERE {
>      ?auth :name ?name .
>      ?auth :wrote ?book .
>      ?book :topic ?topic .
>      ?auth :hasNationality ?nat .
> }
> AGGREGATE { FILTER(?nat = 'IT') (?numberOfItalianAuthors, COUNT,
> {?topic} ) }
> AGGREGATE { FILTER(?nat = 'CH') (?numberOfSwissAuthors, COUNT, {?topic}
> ) FILTER(?numberOfItalianAuthors>?numberOfSwissAuthors)}
>
>
> However, in total, I think their aggregation proposal could have some merits,
> seemingly to allow aggregation with less subqueries necessary, at
> least this seems to be the main point of argumentation. I am not yet
> convinced about their argument that they can express all of our
> aggregation/grouping without an actual proof.
>
>
>
> Axel
>
Received on Tuesday, 16 February 2010 16:42:27 UTC