Comments on aggregation in SPARQL 1.1 from Graham Klyne on 2010-09-28 (public-rdf-dawg-comments@w3.org from September 2010)

From: Graham Klyne <graham.klyne@zoo.ox.ac.uk>
Date: Tue, 28 Sep 2010 14:41:05 +0100
To: public-rdf-dawg-comments@w3.org
Message-ID: <4CA1F071.2090107@zoo.ox.ac.uk>

With reference to: http://www.w3.org/TR/2010/WD-sparql11-query-20100601/


1. Editorial

I'm finding the section on aggregates is quite hard to follow.

More examples, especially for GROUP_CONCAT, might make it easier
to understand the link between the algebra and its practical consequences
in SPARQL queries.

Also, I can't find any indication of the interaction between GROUP BY and ORDER
BY, where the ORDER BY specifies a variable not named in the GROUP BY: is this
allowed? If so, what effect does it have (e.g. on GROUP_CONCAT)?


2. A Nice-to-have feature

For my particular application, something I'd *really* like to see is a form of
aggregation that can be used to create new graph nodes in CONSTRUCT queries for 
combinations of aggregated variable bindings.

By way of example, my first thought along these lines was a GROUP_SET
aggregator which, when the corresponding value is used as an object in a
CONSTRUCT query, causes the corresponding property to be repeated in the
resulting graph for each value in the set.  But I can see that might muss up the
algebraic structures of the underlying query framework.

Another thought was GROUP_HASH aggregator which generates a new node id for each
combination of aggregated results, which can subsequently be used in a DESCRIBE
clause creating new graph nodes.  Or even just a function to turn the result of
a string containing a GROUP_CONCAT result into a valid node id.

More information on the motivation for this suggestion can be found at:
http://code.google.com/p/milarq/wiki/OutstandingIssues#Double-counting

#g

Received on Tuesday, 28 September 2010 13:41:47 UTC