- From: Steve Harris <steve.harris@garlik.com>
- Date: Mon, 8 Mar 2010 14:44:41 +0000
- To: Andy Seaborne <andy.seaborne@talis.com>
- Cc: "public-rdf-dawg@w3.org Group" <public-rdf-dawg@w3.org>
On 8 Mar 2010, at 10:46, Andy Seaborne wrote:
>
> On 07/03/2010 7:48 PM, Steve Harris wrote:
>> On 7 Mar 2010, at 17:34, Andy Seaborne wrote:
>>>
>
>>> Some possible characters are:
>>>
>>> INVISIBLE SEPARATOR 2063
>>> group separator 001D
>>> record separator 001E
>>> unit separator 001F
>>> sequence concatenation 2040
>>>
>>> x1D/x1E/x1F all look sensible possible choices.
>>
>> Agreed, 0x1d-0x1f are the ASCII-inherited control characters of
>> course.
>> 0x1d appeals, being the group separator :)
>
> If we go this way, then we will need to check the detailed
> documentation of each.
Yup. I believe that they're hierarchical, 0x1f being the deepest/
finest level.
http://en.wikipedia.org/wiki/C0_and_C1_control_codes appears to back
this up.
I have a nagging concern that people will just assume that the
separator character won't appear in text, and won't bother to escape,
which of course RDF doesn't guarantee.
>>> Proposal 4 captures this.
>>>
>>> ** Proposal 4a
>>>
>>> GROUP_CONCAT[","](?name)
>>>
>>> as a general aggregate syntax, all aggregates can take an []
>>> argument
>>> list, including custom aggregates.
>>
>> I find this syntax very appealing, might be because it's
>> reminiscent of
>> TeX though!
>>
>> The ability to apply it to custom aggregates as well is good. I can't
>> quite imagine a clean ORDER BY syntax using this though, could be
>> quoted
>> as a string I suppose?
>>
>> GROUP_CONCAT[",", "DESC(strlen(?x))"](?x, ?y)
>
> PostgreSQL recommends a subquery for the nearest equivalent:
>
> SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab;
>
> so GROUP_CONCAT over a pre setup intemediate result gives the full
> capability of ORDER BY LIMIt etc without
>
> SELECT GROUP_CONCAT(?x)
> {
> SELECT ?x
> {
> ....
> }
> ORDER BY
> LIMIT
> }
>
> We would need to tighten up the spec text aroudn subqueries to make
> this perfect
>
> ** Suggestion: This exactly one subSELECT query pattern of an outer
> SELECT preserves order.
I'll have to ruminate on that, it seems a little strange at first
glance, but is a pretty likely behaviour of many systems anyway.
Probably worth a separate discussion.
>> Would the [] arguments take only constants? If not there are issues
>> around grouping and values, or the [] arguments have to be passed by
>> reference too.
>
> I was assuming they would be expressions with variables limited to
> group key variables or outer variables (i.e. the ones that are the
> same for the whole of each partition) and can be eval'ed before the
> partitioning starts.
Or, presumably the result of an aggregate, e.g. SAMPLE()? I.e. the
same rules as projected variables.
>> A minor variation would be to use named arguments:
>>
>> GROUP_CONCAT[separator="|", orderby="DESC(strlen(?x))", limit=10](?x)
>>
>> A bit verbose though.
>
> Have you been Ruby programming recently? :-)
Heh, no, I haven't gone over to the light side. People do this in Perl
too you know :)
>>> We keep the DISTINCT special syntax but it is really another way to
>>> write aggregate modifiers.
>>
>> Yes, it could be equivalent to [distinct=true] in some way, if we
>> went
>> down that route, though DISTINCT does apply to the expression list
>> in a
>> consistent way, so I don't think there's any real benefit.
>
> Yes - the DISTINCT case is so well know we ought to at least copy
> the syntax as a special case.
Right.
>>> I put it in this order because you can think of it as
>>>
>>> (GROUP_CONCAT[","])(?name)
>>>
>>> where
>>>
>>> (GROUP_CONCAT[","])
>>>
>>> yields the specific aggregator for GROUP_CONCAT using ",".
>>
>> As a sort of currying?
>
> Or Schönfinkelisation [1]
>
> It's like it, with the restriction that all the GROUP_CONCAT
> aggregate argument must be defined. Can't have:
>
> ((GROUP_CONCAT[","])[limit=10)(?name)
>
> to apply the first, get a new, partially ground aggregator and then
> ground it further.
Fair enough.
For the record, my (marginal) preference for named arguments is for
forward compatibility. If we allow/encourage [] as an extension
mechanism to built in functions (eg. limit in GROUP_CONCAT), then
relying on argument position is a bit risky, as different
implementations will put the arguments in different orders.
It's still potentially ambiguous because "limit" could be a char limit
(as in MySQL) or a result limit, or a set size limit.
[<uri>=<value>, ...] just isn't that tempting though :)
- Steve
--
Steve Harris, Garlik Limited
2 Sheen Road, Richmond, TW9 1AE, UK
+44 20 8973 2465 http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10
9AD
Received on Monday, 8 March 2010 14:45:11 UTC