- From: Lee Feigenbaum <lee@thefigtrees.net>
- Date: Tue, 16 Feb 2010 11:41:49 -0500
- To: Axel Polleres <axel.polleres@deri.org>
- CC: SPARQL Working Group <public-rdf-dawg@w3.org>
If I understand the proposal correctly, it only ever adds bindings to a
result set. It does not ever collapse result sets the way a standard SQL
aggregate semantics do. This would mean that getting the COUNT of 1
million triples would still return a result set with 1 million rows. If
I understand this correctly, it's a non-starter for most use cases of
aggregates that I have. I'd far rather issue multiple queries that
return a small number of (aggregated) responses.
Lee
On 2/16/2010 9:51 AM, Axel Polleres wrote:
> browsing through Emanuele's proposal... Please forgive that I just quickly wrote this up without a lot of
> in-depth thinking yet... just to kick-off discussion...
>
>
> Firstly, I have some open questions on the proposal we might want to ask them...
>
> 1)
>
> If I get it right, AGGREGATE {var FUNCTION vars )
>
> i) projects first groups wrt variables appearing in vars and then
> ii) evaluates the aggregate on the those groups ...
>
> That may make sense for count, but how does that work for
> min/max, i.e. where is the projection ?
>
> hmmm... actually it seems the grammar given for FUNCTION is wrong...
> it should be
>
> Function | Function '(' var ')'
> or
> Function | Function '(' vars ')'
>
> we may want to ask back for clarification here...
>
> ... ok, but let's assume that this means that they have just
>
> SELECT ... var
> where
> AGGREGATE ( var function vars ) FILTER filter
>
> =more-or-less=
>
> SELECT function as var
> where
> GROUP BY vars
> HAVING filter
>
> with a slightly different implicit grouping than we have at the moment?
> There claimed advantage seems to be that they allow to do different aggregations *at once*
> which seems to have some merits, since we can probably only do this with some cumbersome subqueries at the moment.
>
> 2) I don't get entirely get their examples though... e.g.
>
>
> SELECT ?name ?surname ?book ?numberOfBooks ?averageNumberOfBooks
> WHERE {
> ?auth :name ?name .
> ?auth :surname ?surname .
> ?auth :wrote ?book .
> ?auth :affiliated ?organization .
> }
> AGGREGATE { (?numberOfBooks, COUNT, {?auth} ) FILTER (?numberOfBooks> 5) }
> AGGREGATE { (?affiliationBooks, SUM(?numberOfBooks), {?organization} )
> FILTER (?affiliationBooks> 50)}
>
> here, ?affiliationBooks violates the constraint they pose before:
>
> "In the C-SPARQL language all the variables used in AGGREGATE clauses
> must appear also in the SELECT clause, since aggregation happens after
> standard SPARQL query evaluation. In SPARQL 1.1 the constraint is not
> specified."
>
> I assume they probably meant to say the constraint only to apply to the variables
> mentioned in the function and group part? we may want to ask that back for clarification as well
>
> Also, I'd like to see the result table for this.
>
>
> 3) It seems that their SPARQL1.1 formulation attempt of this one
> has some errors...
>
> SELECT ?topic ?numberOfSwissAuthors ?numberOfItalianAuthors
> WHERE {
> ?auth :name ?name .
> ?auth :wrote ?book .
> ?book :topic ?topic .
> ?auth :hasNationality ?nat .
> }
> AGGREGATE { FILTER(?nat = 'IT') (?numberOfItalianAuthors, COUNT,
> {?topic} ) }
> AGGREGATE { FILTER(?nat = 'CH') (?numberOfSwissAuthors, COUNT, {?topic}
> ) FILTER(?numberOfItalianAuthors>?numberOfSwissAuthors)}
>
>
> However, in total, I think their aggregation proposal could have some merits,
> seemingly to allow aggregation with less subqueries necessary, at
> least this seems to be the main point of argumentation. I am not yet
> convinced about their argument that they can express all of our
> aggregation/grouping without an actual proof.
>
>
>
> Axel
>
Received on Tuesday, 16 February 2010 16:42:27 UTC