on the alternative aggregate proposal (ED-1)... from Axel Polleres on 2010-02-16 (public-rdf-dawg@w3.org from January to March 2010)

From: Axel Polleres <axel.polleres@deri.org>
Date: Tue, 16 Feb 2010 14:51:31 +0000
To: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <BA6BBCA2-22E0-40FF-BB45-D360470CB9A1@deri.org>

browsing through Emanuele's proposal... Please forgive that I just quickly wrote this up without a lot of
in-depth thinking yet... just to kick-off discussion...


Firstly, I have some open questions on the proposal we might want to ask them...

1)

If I get it right, AGGREGATE {var  FUNCTION vars ) 

i) projects first groups wrt variables appearing in vars and then  
ii)  evaluates the aggregate on the those groups ...

That may make sense for count, but how does that work for 
min/max, i.e. where is the projection ? 

   hmmm... actually it seems the grammar given for FUNCTION is wrong... 
   it should be 
    
    Function | Function '(' var ')'
  or 
    Function | Function '(' vars ')'

we may want to ask back for clarification here...

... ok, but let's assume that this means that they have just
 
SELECT ... var
where
AGGREGATE ( var function vars ) FILTER filter

=more-or-less= 

SELECT function as var
where
GROUP BY vars
HAVING filter
 
with a slightly different implicit grouping than we have at the moment?
There claimed advantage seems to be that they allow to do different aggregations *at once* 
which seems to have some merits, since we can probably only do this with some cumbersome subqueries at the moment.

2) I don't get entirely get their examples though... e.g.


SELECT ?name ?surname ?book ?numberOfBooks ?averageNumberOfBooks
WHERE {
    ?auth :name ?name .
    ?auth :surname ?surname .
    ?auth :wrote ?book .
    ?auth :affiliated ?organization .
}
AGGREGATE { (?numberOfBooks, COUNT, {?auth} ) FILTER (?numberOfBooks > 5) }
AGGREGATE { (?affiliationBooks, SUM(?numberOfBooks), {?organization} )
FILTER (?affiliationBooks > 50)}

here, ?affiliationBooks violates the constraint they pose before:

"In the C-SPARQL language all the variables used in AGGREGATE clauses
must appear also in the SELECT clause, since aggregation happens after
standard SPARQL query evaluation. In SPARQL 1.1 the constraint is not
specified."

I assume they probably  meant to say the constraint only to apply to  the variables 
mentioned in the function and group part? we may want to ask that back for clarification as well

Also, I'd like to see the result table for this.


3) It seems that their SPARQL1.1 formulation attempt of this one
has some errors...

SELECT ?topic ?numberOfSwissAuthors ?numberOfItalianAuthors
WHERE {
    ?auth :name ?name .
    ?auth :wrote ?book .
    ?book :topic ?topic .
    ?auth :hasNationality ?nat .
}
AGGREGATE { FILTER(?nat = 'IT') (?numberOfItalianAuthors, COUNT,
{?topic} ) }
AGGREGATE { FILTER(?nat = 'CH') (?numberOfSwissAuthors, COUNT, {?topic}
) FILTER(?numberOfItalianAuthors>?numberOfSwissAuthors)}


However, in total, I think their aggregation proposal could have some merits,
seemingly to allow aggregation with less subqueries necessary, at
least this seems to be the main point of argumentation. I am not yet
convinced about their argument that they can express all of our
aggregation/grouping without an actual proof.



Axel

Received on Tuesday, 16 February 2010 14:52:06 UTC