- From: Steve Harris <steve.harris@garlik.com>
- Date: Mon, 8 Mar 2010 14:44:41 +0000
- To: Andy Seaborne <andy.seaborne@talis.com>
- Cc: "public-rdf-dawg@w3.org Group" <public-rdf-dawg@w3.org>
On 8 Mar 2010, at 10:46, Andy Seaborne wrote: > > On 07/03/2010 7:48 PM, Steve Harris wrote: >> On 7 Mar 2010, at 17:34, Andy Seaborne wrote: >>> > >>> Some possible characters are: >>> >>> INVISIBLE SEPARATOR 2063 >>> group separator 001D >>> record separator 001E >>> unit separator 001F >>> sequence concatenation 2040 >>> >>> x1D/x1E/x1F all look sensible possible choices. >> >> Agreed, 0x1d-0x1f are the ASCII-inherited control characters of >> course. >> 0x1d appeals, being the group separator :) > > If we go this way, then we will need to check the detailed > documentation of each. Yup. I believe that they're hierarchical, 0x1f being the deepest/ finest level. http://en.wikipedia.org/wiki/C0_and_C1_control_codes appears to back this up. I have a nagging concern that people will just assume that the separator character won't appear in text, and won't bother to escape, which of course RDF doesn't guarantee. >>> Proposal 4 captures this. >>> >>> ** Proposal 4a >>> >>> GROUP_CONCAT[","](?name) >>> >>> as a general aggregate syntax, all aggregates can take an [] >>> argument >>> list, including custom aggregates. >> >> I find this syntax very appealing, might be because it's >> reminiscent of >> TeX though! >> >> The ability to apply it to custom aggregates as well is good. I can't >> quite imagine a clean ORDER BY syntax using this though, could be >> quoted >> as a string I suppose? >> >> GROUP_CONCAT[",", "DESC(strlen(?x))"](?x, ?y) > > PostgreSQL recommends a subquery for the nearest equivalent: > > SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab; > > so GROUP_CONCAT over a pre setup intemediate result gives the full > capability of ORDER BY LIMIt etc without > > SELECT GROUP_CONCAT(?x) > { > SELECT ?x > { > .... > } > ORDER BY > LIMIT > } > > We would need to tighten up the spec text aroudn subqueries to make > this perfect > > ** Suggestion: This exactly one subSELECT query pattern of an outer > SELECT preserves order. I'll have to ruminate on that, it seems a little strange at first glance, but is a pretty likely behaviour of many systems anyway. Probably worth a separate discussion. >> Would the [] arguments take only constants? If not there are issues >> around grouping and values, or the [] arguments have to be passed by >> reference too. > > I was assuming they would be expressions with variables limited to > group key variables or outer variables (i.e. the ones that are the > same for the whole of each partition) and can be eval'ed before the > partitioning starts. Or, presumably the result of an aggregate, e.g. SAMPLE()? I.e. the same rules as projected variables. >> A minor variation would be to use named arguments: >> >> GROUP_CONCAT[separator="|", orderby="DESC(strlen(?x))", limit=10](?x) >> >> A bit verbose though. > > Have you been Ruby programming recently? :-) Heh, no, I haven't gone over to the light side. People do this in Perl too you know :) >>> We keep the DISTINCT special syntax but it is really another way to >>> write aggregate modifiers. >> >> Yes, it could be equivalent to [distinct=true] in some way, if we >> went >> down that route, though DISTINCT does apply to the expression list >> in a >> consistent way, so I don't think there's any real benefit. > > Yes - the DISTINCT case is so well know we ought to at least copy > the syntax as a special case. Right. >>> I put it in this order because you can think of it as >>> >>> (GROUP_CONCAT[","])(?name) >>> >>> where >>> >>> (GROUP_CONCAT[","]) >>> >>> yields the specific aggregator for GROUP_CONCAT using ",". >> >> As a sort of currying? > > Or Schönfinkelisation [1] > > It's like it, with the restriction that all the GROUP_CONCAT > aggregate argument must be defined. Can't have: > > ((GROUP_CONCAT[","])[limit=10)(?name) > > to apply the first, get a new, partially ground aggregator and then > ground it further. Fair enough. For the record, my (marginal) preference for named arguments is for forward compatibility. If we allow/encourage [] as an extension mechanism to built in functions (eg. limit in GROUP_CONCAT), then relying on argument position is a bit risky, as different implementations will put the arguments in different orders. It's still potentially ambiguous because "limit" could be a char limit (as in MySQL) or a result limit, or a set size limit. [<uri>=<value>, ...] just isn't that tempting though :) - Steve -- Steve Harris, Garlik Limited 2 Sheen Road, Richmond, TW9 1AE, UK +44 20 8973 2465 http://www.garlik.com/ Registered in England and Wales 535 7233 VAT # 849 0517 11 Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Monday, 8 March 2010 14:45:11 UTC