W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > January to March 2010

Separator string in GROUP_CONCAT()

From: Steve Harris <steve.harris@garlik.com>
Date: Fri, 5 Mar 2010 11:40:45 +0000
Message-Id: <F42E23E7-31BA-4D5A-A5ED-9BFF8573BCD6@garlik.com>
To: "public-rdf-dawg@w3.org Group" <public-rdf-dawg@w3.org>
Hi all,

Problem:

There's no way to specify a separator string in the draft GROUP_CONCAT  
aggregate. I have a vague memory that we'd discussed this briefly  
somewhere, F2F2, or on a call maybe, but it's pretty hazy. This was  
brought up in Rob Vesse's recent comment.

Proposal 1:

Leave it as it is. Users cannot specify the separator character, it's  
fixed in the spec.

Upside, very simple. Downside, might limit usefulness.

Probably should make sure there's an escaping function in SPARQL 1.1  
that's compatible with the character.

Proposal 2:

If the GROUP_CONCAT expression list has more than one element, then  
the lexically last one is removed and used as the separator before  
being passed to the Aggregation() algebra function. e.g. GROUP_CONCAT(? 
x, ?y, "|")

Upside, keeps the grammar simple. Downside makes the algebra around  
GROUP_CONCAT weird, might be surprising as the multi-expression  
behaviour will be different to other aggregates.

e.g. in GROUP_CONCAT(?x, ?y) ?y will be an argument to the underlying  
function, not an expression. Would probably have to pick a value of ?y  
to random, a la SAMPLE(), as we don't require that "arguments" to  
aggregates are scalar.

Proposal 3:

Use MySQL syntax to specify it, i.e. GROUP_CONCAT(?x, ?y SEPARATOR "|").

Upside, the same as MySQL (where GROUP_CONCAT comes from), avoids  
weirding algebra. Downside, makes the grammar more complex.

Proposal 4:

Like 3, but with some other explicit syntax. e.g. GROUP_CONCAT(?x, ?y) 
[SEPARATOR "|"]

Upside, avoids weirding algebra. Downside, we have to think of our own  
syntax, no familiarity for MySQL users and probably makes the grammar  
more complex.

---

My opinion:

I'd take 3, or 1 happily, but I think 4 is a bit arbitrary, and 2 is  
really nasty.

There's also other useful syntax around GROUP_CONCAT, e.g. ORDER BY,  
so I expect a future SPARQL will end up with something like 3 or 4  
anyway.

- Steve

-- 
Steve Harris, Garlik Limited
2 Sheen Road, Richmond, TW9 1AE, UK
+44 20 8973 2465  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10  
9AD
Received on Friday, 5 March 2010 11:41:16 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:41 GMT