Re: GROUP_CONCAT and ordering

On 2011-04-22, at 06:18, Jeen Broekstra wrote:

> 
> Hi all, (Cc Ruslan),
> 
> Ruslan and I are currently working on conformance testing for Sesame's implementation of SPARQL 1.1 query, and there is a case where I am not 100% sure what the expected behavior is. This case involves a query that uses a GROUP_CONCAT aggregate, a grouping, and an order by solution modifier. Before I start pestering the SPARQL working group I'd like to hear some other developers' thoughts.
> 
> All SPARQL aggregate operators are defined to work on multisets. This means that by default, the input of an aggregate does not have any prescribed order. For most aggregates this is irrelevant anyway, but for GROUP_CONCAT it does make a difference. Consider the following example data:
> 
> :org1 :affiliates :p1, :p2 .
> :org2 :affiliates :p3, :p4 .
> 
> :p1 :name "John" .
> :p2 :name "Paul" .
> :p3 :name "Ringo" .
> :p4 :name "George" .
> 
> I want to produce a query that gives me concatenated names per organisation. Each concatenated string should have the names in alphabetical order.
> 
> My initial thought was that this query would do the trick:
> 
> SELECT ?org (GROUP_CONCAT(?name) as ?names)
> WHERE {?org :affiliates ?p. ?p :name ?name }
> GROUP BY ?org
> ORDER BY ASC(str(?name))
> 
> Expected result:
> 
> ?org 	?names
> --------------
> :org1	"John Paul"
> :org2	"George Ringo"
> 
> However, looking at the SPARQL 1.1 query spec, I think this is not a guaranteed result: as far as I can tell the solution modifier ORDER BY should be applied to the solution sequence _after_ grouping and aggregation, so it can not influence the order of the input for the GROUP_CONCAT. This would mean that for the above query, the result could equally well be:

That's correct.

> ?org 	?names
> --------------
> :org1	"Paul John"
> :org2	"George Ringo"
> 
> or indeed any other permutation of name concatenations.
> 
> I have thought about using some subquery to solve the problem, but since SPARQL defines the input of an aggregate operator explicitly as a _set_, I am not even sure that would work: as far as I can tell a SPARQL engine has no obligation to preserve input order when evaluating aggregate operators.
> 
> Two questions:
> 
> 1. is the above correct?
> 2. is there any other way in SPARQL 1.1 to enforce ordering on a GROUP_CONCAT?
> 
> In relation to question 2, I note that in MySQL, the standard SQL group_concat operator (on which, I assume, the SPARQL operator has been based) has been extended to include an ordering clause as an argument to the group_concat function itself. See http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat.

Yup, that's where the inspiration came from.

- Steve

Received on Friday, 22 April 2011 18:02:05 UTC