Re: Separator string in GROUP_CONCAT()

On 3/5/2010 6:40 AM, Steve Harris wrote:
> Hi all,
>
> Problem:
>
> There's no way to specify a separator string in the draft GROUP_CONCAT
> aggregate. I have a vague memory that we'd discussed this briefly
> somewhere, F2F2, or on a call maybe, but it's pretty hazy. This was
> brought up in Rob Vesse's recent comment.
>
> Proposal 1:
>
> Leave it as it is. Users cannot specify the separator character, it's
> fixed in the spec.
>
> Upside, very simple. Downside, might limit usefulness.
>
> Probably should make sure there's an escaping function in SPARQL 1.1
> that's compatible with the character.
>
> Proposal 2:
>
> If the GROUP_CONCAT expression list has more than one element, then the
> lexically last one is removed and used as the separator before being
> passed to the Aggregation() algebra function. e.g. GROUP_CONCAT(?x, ?y,
> "|")
>
> Upside, keeps the grammar simple. Downside makes the algebra around
> GROUP_CONCAT weird, might be surprising as the multi-expression
> behaviour will be different to other aggregates.
>
> e.g. in GROUP_CONCAT(?x, ?y) ?y will be an argument to the underlying
> function, not an expression. Would probably have to pick a value of ?y
> to random, a la SAMPLE(), as we don't require that "arguments" to
> aggregates are scalar.
>
> Proposal 3:
>
> Use MySQL syntax to specify it, i.e. GROUP_CONCAT(?x, ?y SEPARATOR "|").
>
> Upside, the same as MySQL (where GROUP_CONCAT comes from), avoids
> weirding algebra. Downside, makes the grammar more complex.
>
> Proposal 4:
>
> Like 3, but with some other explicit syntax. e.g. GROUP_CONCAT(?x,
> ?y)[SEPARATOR "|"]
>
> Upside, avoids weirding algebra. Downside, we have to think of our own
> syntax, no familiarity for MySQL users and probably makes the grammar
> more complex.
>
> ---
>
> My opinion:
>
> I'd take 3, or 1 happily, but I think 4 is a bit arbitrary, and 2 is
> really nasty.

As Andy said, thanks for this, Steve.

I agree with your preferences. In Glitter, I implement SEPARATOR as in 
MySQL (option 3). That said...

> There's also other useful syntax around GROUP_CONCAT, e.g. ORDER BY, so
> I expect a future SPARQL will end up with something like 3 or 4 anyway.

...if we go with Option 1 now, we'll likely get some complaints from the 
community, but we'll also give implementers a chance to play with the 
best approach to this?

I think we should go with Option 3 if we feel that consistency with 
MySQL is valuable. If we don't feel that way, I think we shoudl go with 
Option 1 AND, in that case, we should consider whether we want to use a 
name other than GROUP_CONCAT.

Lee

Received on Sunday, 7 March 2010 13:23:37 UTC