- From: Lee Feigenbaum <lee@thefigtrees.net>
- Date: Tue, 16 Feb 2010 11:41:49 -0500
- To: Axel Polleres <axel.polleres@deri.org>
- CC: SPARQL Working Group <public-rdf-dawg@w3.org>
If I understand the proposal correctly, it only ever adds bindings to a result set. It does not ever collapse result sets the way a standard SQL aggregate semantics do. This would mean that getting the COUNT of 1 million triples would still return a result set with 1 million rows. If I understand this correctly, it's a non-starter for most use cases of aggregates that I have. I'd far rather issue multiple queries that return a small number of (aggregated) responses. Lee On 2/16/2010 9:51 AM, Axel Polleres wrote: > browsing through Emanuele's proposal... Please forgive that I just quickly wrote this up without a lot of > in-depth thinking yet... just to kick-off discussion... > > > Firstly, I have some open questions on the proposal we might want to ask them... > > 1) > > If I get it right, AGGREGATE {var FUNCTION vars ) > > i) projects first groups wrt variables appearing in vars and then > ii) evaluates the aggregate on the those groups ... > > That may make sense for count, but how does that work for > min/max, i.e. where is the projection ? > > hmmm... actually it seems the grammar given for FUNCTION is wrong... > it should be > > Function | Function '(' var ')' > or > Function | Function '(' vars ')' > > we may want to ask back for clarification here... > > ... ok, but let's assume that this means that they have just > > SELECT ... var > where > AGGREGATE ( var function vars ) FILTER filter > > =more-or-less= > > SELECT function as var > where > GROUP BY vars > HAVING filter > > with a slightly different implicit grouping than we have at the moment? > There claimed advantage seems to be that they allow to do different aggregations *at once* > which seems to have some merits, since we can probably only do this with some cumbersome subqueries at the moment. > > 2) I don't get entirely get their examples though... e.g. > > > SELECT ?name ?surname ?book ?numberOfBooks ?averageNumberOfBooks > WHERE { > ?auth :name ?name . > ?auth :surname ?surname . > ?auth :wrote ?book . > ?auth :affiliated ?organization . > } > AGGREGATE { (?numberOfBooks, COUNT, {?auth} ) FILTER (?numberOfBooks> 5) } > AGGREGATE { (?affiliationBooks, SUM(?numberOfBooks), {?organization} ) > FILTER (?affiliationBooks> 50)} > > here, ?affiliationBooks violates the constraint they pose before: > > "In the C-SPARQL language all the variables used in AGGREGATE clauses > must appear also in the SELECT clause, since aggregation happens after > standard SPARQL query evaluation. In SPARQL 1.1 the constraint is not > specified." > > I assume they probably meant to say the constraint only to apply to the variables > mentioned in the function and group part? we may want to ask that back for clarification as well > > Also, I'd like to see the result table for this. > > > 3) It seems that their SPARQL1.1 formulation attempt of this one > has some errors... > > SELECT ?topic ?numberOfSwissAuthors ?numberOfItalianAuthors > WHERE { > ?auth :name ?name . > ?auth :wrote ?book . > ?book :topic ?topic . > ?auth :hasNationality ?nat . > } > AGGREGATE { FILTER(?nat = 'IT') (?numberOfItalianAuthors, COUNT, > {?topic} ) } > AGGREGATE { FILTER(?nat = 'CH') (?numberOfSwissAuthors, COUNT, {?topic} > ) FILTER(?numberOfItalianAuthors>?numberOfSwissAuthors)} > > > However, in total, I think their aggregation proposal could have some merits, > seemingly to allow aggregation with less subqueries necessary, at > least this seems to be the main point of argumentation. I am not yet > convinced about their argument that they can express all of our > aggregation/grouping without an actual proof. > > > > Axel >
Received on Tuesday, 16 February 2010 16:42:27 UTC