GROUP_CONCAT and ordering

Hi all, (Cc Ruslan),

Ruslan and I are currently working on conformance testing for Sesame's 
implementation of SPARQL 1.1 query, and there is a case where I am not 
100% sure what the expected behavior is. This case involves a query that 
uses a GROUP_CONCAT aggregate, a grouping, and an order by solution 
modifier. Before I start pestering the SPARQL working group I'd like to 
hear some other developers' thoughts.

All SPARQL aggregate operators are defined to work on multisets. This 
means that by default, the input of an aggregate does not have any 
prescribed order. For most aggregates this is irrelevant anyway, but for 
GROUP_CONCAT it does make a difference. Consider the following example data:

:org1 :affiliates :p1, :p2 .
:org2 :affiliates :p3, :p4 .

:p1 :name "John" .
:p2 :name "Paul" .
:p3 :name "Ringo" .
:p4 :name "George" .

I want to produce a query that gives me concatenated names per 
organisation. Each concatenated string should have the names in 
alphabetical order.

My initial thought was that this query would do the trick:

SELECT ?org (GROUP_CONCAT(?name) as ?names)
WHERE {?org :affiliates ?p. ?p :name ?name }
GROUP BY ?org
ORDER BY ASC(str(?name))

Expected result:

?org 	?names
--------------
:org1	"John Paul"
:org2	"George Ringo"

However, looking at the SPARQL 1.1 query spec, I think this is not a 
guaranteed result: as far as I can tell the solution modifier ORDER BY 
should be applied to the solution sequence _after_ grouping and 
aggregation, so it can not influence the order of the input for the 
GROUP_CONCAT. This would mean that for the above query, the result could 
equally well be:

?org 	?names
--------------
:org1	"Paul John"
:org2	"George Ringo"

or indeed any other permutation of name concatenations.

I have thought about using some subquery to solve the problem, but since 
SPARQL defines the input of an aggregate operator explicitly as a _set_, 
I am not even sure that would work: as far as I can tell a SPARQL engine 
has no obligation to preserve input order when evaluating aggregate 
operators.

Two questions:

1. is the above correct?
2. is there any other way in SPARQL 1.1 to enforce ordering on a 
GROUP_CONCAT?

In relation to question 2, I note that in MySQL, the standard SQL 
group_concat operator (on which, I assume, the SPARQL operator has been 
based) has been extended to include an ordering clause as an argument to 
the group_concat function itself. See 
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat.


Chocolate egg for your thoughts,

Jeen

Received on Friday, 22 April 2011 05:19:07 UTC