Re: Separator string in GROUP_CONCAT()

On 7 Mar 2010, at 13:22, Lee Feigenbaum <lee@thefigtrees.net> wrote:

> On 3/5/2010 6:40 AM, Steve Harris wrote:
>> Hi all,
>>
>> Problem:
>>
>> There's no way to specify a separator string in the draft  
>> GROUP_CONCAT
>> aggregate. I have a vague memory that we'd discussed this briefly
>> somewhere, F2F2, or on a call maybe, but it's pretty hazy. This was
>> brought up in Rob Vesse's recent comment.
>>
>> Proposal 1:
>>
>> Leave it as it is. Users cannot specify the separator character, it's
>> fixed in the spec.
>>
>> Upside, very simple. Downside, might limit usefulness.
>>
>> Probably should make sure there's an escaping function in SPARQL 1.1
>> that's compatible with the character.
>>
>> Proposal 2:
>>
>> If the GROUP_CONCAT expression list has more than one element, then  
>> the
>> lexically last one is removed and used as the separator before being
>> passed to the Aggregation() algebra function. e.g. GROUP_CONCAT(? 
>> x, ?y,
>> "|")
>>
>> Upside, keeps the grammar simple. Downside makes the algebra around
>> GROUP_CONCAT weird, might be surprising as the multi-expression
>> behaviour will be different to other aggregates.
>>
>> e.g. in GROUP_CONCAT(?x, ?y) ?y will be an argument to the underlying
>> function, not an expression. Would probably have to pick a value  
>> of ?y
>> to random, a la SAMPLE(), as we don't require that "arguments" to
>> aggregates are scalar.
>>
>> Proposal 3:
>>
>> Use MySQL syntax to specify it, i.e. GROUP_CONCAT(?x, ?y SEPARATOR  
>> "|").
>>
>> Upside, the same as MySQL (where GROUP_CONCAT comes from), avoids
>> weirding algebra. Downside, makes the grammar more complex.
>>
>> Proposal 4:
>>
>> Like 3, but with some other explicit syntax. e.g. GROUP_CONCAT(?x,
>> ?y)[SEPARATOR "|"]
>>
>> Upside, avoids weirding algebra. Downside, we have to think of our  
>> own
>> syntax, no familiarity for MySQL users and probably makes the grammar
>> more complex.
>>
>> ---
>>
>> My opinion:
>>
>> I'd take 3, or 1 happily, but I think 4 is a bit arbitrary, and 2 is
>> really nasty.
>
> As Andy said, thanks for this, Steve.
>
> I agree with your preferences. In Glitter, I implement SEPARATOR as  
> in MySQL (option 3). That said...
>
>> There's also other useful syntax around GROUP_CONCAT, e.g. ORDER  
>> BY, so
>> I expect a future SPARQL will end up with something like 3 or 4  
>> anyway.
>
> ...if we go with Option 1 now, we'll likely get some complaints from  
> the community, but we'll also give implementers a chance to play  
> with the best approach to this?

That can be a bit dangerous, as you can end up with syntacally legal,  
but semantically different approaches, eg. Option 2 looks like an  
expression list. I'd hope noone would go with that, but at least Rob  
was tempted.

> I think we should go with Option 3 if we feel that consistency with  
> MySQL is valuable. If we don't feel that way, I think we shoudl go  
> with Option 1 AND, in that case, we should consider whether we want  
> to use a name other than GROUP_CONCAT.

Agreed. I think I'd end up implementing mysql style G_C regardless, as  
it's so familiar to users.

- Steve

Received on Sunday, 7 March 2010 14:47:45 UTC