Re: on the alternative aggregate proposal (ED-1)...

If I understand the proposal correctly, it only ever adds bindings to a 
result set. It does not ever collapse result sets the way a standard SQL 
aggregate semantics do. This would mean that getting the COUNT of 1 
million triples would still return a result set with 1 million rows. If 
I understand this correctly, it's a non-starter for most use cases of 
aggregates that I have. I'd far rather issue multiple queries that 
return a small number of (aggregated) responses.


On 2/16/2010 9:51 AM, Axel Polleres wrote:
> browsing through Emanuele's proposal... Please forgive that I just quickly wrote this up without a lot of
> in-depth thinking yet... just to kick-off discussion...
> Firstly, I have some open questions on the proposal we might want to ask them...
> 1)
> If I get it right, AGGREGATE {var  FUNCTION vars )
> i) projects first groups wrt variables appearing in vars and then
> ii)  evaluates the aggregate on the those groups ...
> That may make sense for count, but how does that work for
> min/max, i.e. where is the projection ?
>     hmmm... actually it seems the grammar given for FUNCTION is wrong...
>     it should be
>      Function | Function '(' var ')'
>    or
>      Function | Function '(' vars ')'
> we may want to ask back for clarification here...
> ... ok, but let's assume that this means that they have just
> SELECT ... var
> where
> AGGREGATE ( var function vars ) FILTER filter
> =more-or-less=
> SELECT function as var
> where
> GROUP BY vars
> HAVING filter
> with a slightly different implicit grouping than we have at the moment?
> There claimed advantage seems to be that they allow to do different aggregations *at once*
> which seems to have some merits, since we can probably only do this with some cumbersome subqueries at the moment.
> 2) I don't get entirely get their examples though... e.g.
> SELECT ?name ?surname ?book ?numberOfBooks ?averageNumberOfBooks
>      ?auth :name ?name .
>      ?auth :surname ?surname .
>      ?auth :wrote ?book .
>      ?auth :affiliated ?organization .
> }
> AGGREGATE { (?numberOfBooks, COUNT, {?auth} ) FILTER (?numberOfBooks>  5) }
> AGGREGATE { (?affiliationBooks, SUM(?numberOfBooks), {?organization} )
> FILTER (?affiliationBooks>  50)}
> here, ?affiliationBooks violates the constraint they pose before:
> "In the C-SPARQL language all the variables used in AGGREGATE clauses
> must appear also in the SELECT clause, since aggregation happens after
> standard SPARQL query evaluation. In SPARQL 1.1 the constraint is not
> specified."
> I assume they probably  meant to say the constraint only to apply to  the variables
> mentioned in the function and group part? we may want to ask that back for clarification as well
> Also, I'd like to see the result table for this.
> 3) It seems that their SPARQL1.1 formulation attempt of this one
> has some errors...
> SELECT ?topic ?numberOfSwissAuthors ?numberOfItalianAuthors
>      ?auth :name ?name .
>      ?auth :wrote ?book .
>      ?book :topic ?topic .
>      ?auth :hasNationality ?nat .
> }
> AGGREGATE { FILTER(?nat = 'IT') (?numberOfItalianAuthors, COUNT,
> {?topic} ) }
> AGGREGATE { FILTER(?nat = 'CH') (?numberOfSwissAuthors, COUNT, {?topic}
> ) FILTER(?numberOfItalianAuthors>?numberOfSwissAuthors)}
> However, in total, I think their aggregation proposal could have some merits,
> seemingly to allow aggregation with less subqueries necessary, at
> least this seems to be the main point of argumentation. I am not yet
> convinced about their argument that they can express all of our
> aggregation/grouping without an actual proof.
> Axel

Received on Tuesday, 16 February 2010 16:42:27 UTC