- From: Andy Seaborne <andy.seaborne@talis.com>
- Date: Fri, 19 Mar 2010 09:48:19 +0000
- To: Steve Harris <steve.harris@garlik.com>
- CC: "public-rdf-dawg@w3.org Group" <public-rdf-dawg@w3.org>

Option 1: -1 Our custom aggregate expressions are supposed to be like function calls (with addition [] and DISTINCT) so are already supporting multiple expressions. It would been AGG(,) or some other different-from-function-call syntax, to limit custom aggregate syntax to one argument in the grammar rules. Option 2: -1 I don't read the current doc as saying clearly that multiple expressions are supported. The nearest text seems to be (and we already know this needs reworking): """ Aggregation applies a function func to a multiset of expressions. """ I read that as the multiset of expressions is from one expression per row in the partition. It may have been the intention to read it as collapsing expressions across rows and across partition elements but I wasn't reading it that way. Option 3: +1 ------- Let Ω be a partition. ExprMultiSet(Ω) = { eval(exprlist,μ) | μ in Ω such that eval(exprlist,μ) is defined } UNION { e | μ in Ω such that eval(μ(expr)) is undefined } where exprlist = (expr1, expr2, ...) eval(exprtuple,μ) = (eval(expr1,μ), eval(expr2,μ), ... and is undefined if any eval(exprN,μ) is undefined where "e" is some symbol that is distinct from all RDF terms. card[x in Ω]: if DISTINCT: card[x] = 1 if there exists x in ExprMultiSet(Ω) card[x] = 0 otherwise else card[x] = count of μ in Ω such that x = eval(exprlist,μ) -------- Alternative: put "e" in the list for any bad evaluations, and remove the UNION. SUM, COUNT, MIN, MAX, AVG - single expression (DISTINCT? ?x) COUNT(*), COUNT(DISTINCT *) SAMPLE, GROUP_CONCAT -- multiple argument expressions. COUNT with more than one argument seems to be a MySQL-ism and according to the document (5.1) only these three forms exist: COUNT(expr) COUNT(*) COUNT(DISTINCT expr,[expr...]) and not COUNT(DISTINCT *) Andy On 17/03/2010 5:44 PM, Steve Harris wrote: > Hi all, > > The Problem: > > Some SQL implementations (at least Sybase, Postgres, Oracle) support > multi-expression aggregates, but not with the multiset semantics as in > the current working draft. > > An example from Postgres is the CORR(a, b) aggregate, which can be used > like: > > w x y > 1 1 2 > 1 2 3 > 1 3 4 > 2 1 1 > 2 2 2 > > SELECT w, CORR(x, y) AS z FROM A GROUP BY w; > > Following current SPARQL draft the equivalent: > > SELECT ?w (CORR(?x, ?y) AS ?z) WHERE { ?w :x ?x ; :y ?y } GROUP BY ?w) > > would evaluate as > > [Res A] > w z > 1 CORR({1, 2, 2, 3, 3, 4}) > 2 CORR({1, 1, 2, 2}) > > But Postgres etc. users will be expecting > > [Res B] > w z > 1 CORR({(1, 2), (2, 3), (3, 4)}) > 2 CORR({(1, 1), (2, 2)}) > > ---- > > So, there are 3 proposals that make sense to me: > > Option 1: > > Ban multi expression aggregates, leave decision to future working group. > > Advantage: easy, can get consensus on what is best to do in future. > Common situation in SQL engines (MS SQL Server, MySQL, SQLite, ...). > Disadvantage: no way to implement stats functions aggregates (for e.g.) > within standard. COUNT(?x, ?y) equivalent becomes more verbose. > > Option 2: > > Stick with WD semantics, multi expression aggregates expand to a set of > values, as Res A above. > > Advantage: makes things like COUNT(?x, ?y) easy, algebra is simple > Disadvantage: rules out things like CORR, unless we specify expression > ordering is preserved, even if we do that the semantics of them will be > a little strange. What does CORR(?a, ?b , ?c) do? > > Option 3: > > Define (multi expression?) aggregates as producing a multiset of lists, > as Res B above. > > Advantage: makes it easy to define stats aggregates in the future (I'm > not proposing we do them in this round, it's a bit too much to bite off > IMHO). > Disadvantage: makes defn. of COUNT() etc. a bit more complex. Makes > algebra a bit more complex. Questions around whether COUNT({(1, 2), (3, > 4)}) = COUNT({1, 2, 3, 4}) etc. > > ---- > > My preference is probably Option 3, but I could live with Option 1. > Option 2 is OK, just we have to accept that stats aggregates in the > future will be a bit messy. > > - Steve >

Received on Friday, 19 March 2010 09:49:04 UTC