Re: Separator string in GROUP_CONCAT() from Andy Seaborne on 2010-03-08 (public-rdf-dawg@w3.org from January to March 2010)

From: Andy Seaborne <andy.seaborne@talis.com>
Date: Mon, 08 Mar 2010 10:46:23 +0000
To: Steve Harris <steve.harris@garlik.com>
CC: "public-rdf-dawg@w3.org Group" <public-rdf-dawg@w3.org>
Message-ID: <4B94D57F.5070902@talis.com>

On 07/03/2010 7:48 PM, Steve Harris wrote:
> On 7 Mar 2010, at 17:34, Andy Seaborne wrote:
>>

>> Some possible characters are:
>>
>> INVISIBLE SEPARATOR 2063
>> group separator 001D
>> record separator 001E
>> unit separator 001F
>> sequence concatenation 2040
>>
>> x1D/x1E/x1F all look sensible possible choices.
>
> Agreed, 0x1d-0x1f are the ASCII-inherited control characters of course.
> 0x1d appeals, being the group separator :)

If we go this way, then we will need to check the detailed documentation 
of each.


>> Proposal 4 captures this.
>>
>> ** Proposal 4a
>>
>> GROUP_CONCAT[","](?name)
>>
>> as a general aggregate syntax, all aggregates can take an [] argument
>> list, including custom aggregates.
>
> I find this syntax very appealing, might be because it's reminiscent of
> TeX though!
>
> The ability to apply it to custom aggregates as well is good. I can't
> quite imagine a clean ORDER BY syntax using this though, could be quoted
> as a string I suppose?
>
> GROUP_CONCAT[",", "DESC(strlen(?x))"](?x, ?y)

PostgreSQL recommends a subquery for the nearest equivalent:

SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab;

so GROUP_CONCAT over a pre setup intemediate result gives the full 
capability of ORDER BY LIMIt etc without

SELECT GROUP_CONCAT(?x)
{
    SELECT ?x
    {
       ....
    }
    ORDER BY
    LIMIT
}

We would need to tighten up the spec text aroudn subqueries to make this 
perfect

** Suggestion: This exactly one subSELECT query pattern of an outer 
SELECT preserves order.

> Would the [] arguments take only constants? If not there are issues
> around grouping and values, or the [] arguments have to be passed by
> reference too.

I was assuming they would be expressions with variables limited to group 
key variables or outer variables (i.e. the ones that are the same for 
the whole of each partition) and can be eval'ed before the partitioning 
starts.

>
> A minor variation would be to use named arguments:
>
> GROUP_CONCAT[separator="|", orderby="DESC(strlen(?x))", limit=10](?x)
>
> A bit verbose though.

Have you been Ruby programming recently? :-)

>
>> We keep the DISTINCT special syntax but it is really another way to
>> write aggregate modifiers.
>
> Yes, it could be equivalent to [distinct=true] in some way, if we went
> down that route, though DISTINCT does apply to the expression list in a
> consistent way, so I don't think there's any real benefit.

Yes - the DISTINCT case is so well know we ought to at least copy the 
syntax as a special case.

>> I put it in this order because you can think of it as
>>
>> (GROUP_CONCAT[","])(?name)
>>
>> where
>>
>> (GROUP_CONCAT[","])
>>
>> yields the specific aggregator for GROUP_CONCAT using ",".
>
> As a sort of currying?

Or Schönfinkelisation [1]

It's like it, with the restriction that all the GROUP_CONCAT aggregate 
argument must be defined.  Can't have:

((GROUP_CONCAT[","])[limit=10)(?name)

to apply the first, get a new, partially ground aggregator and then 
ground it further.

[1]http://en.wikipedia.org/wiki/Currying

	Andy

Received on Monday, 8 March 2010 10:46:56 UTC