- From: Steve Harris <steve.harris@garlik.com>
- Date: Wed, 17 Mar 2010 17:44:37 +0000
- To: "public-rdf-dawg@w3.org Group" <public-rdf-dawg@w3.org>
Hi all, The Problem: Some SQL implementations (at least Sybase, Postgres, Oracle) support multi-expression aggregates, but not with the multiset semantics as in the current working draft. An example from Postgres is the CORR(a, b) aggregate, which can be used like: w x y 1 1 2 1 2 3 1 3 4 2 1 1 2 2 2 SELECT w, CORR(x, y) AS z FROM A GROUP BY w; Following current SPARQL draft the equivalent: SELECT ?w (CORR(?x, ?y) AS ?z) WHERE { ?w :x ?x ; :y ?y } GROUP BY ?w) would evaluate as [Res A] w z 1 CORR({1, 2, 2, 3, 3, 4}) 2 CORR({1, 1, 2, 2}) But Postgres etc. users will be expecting [Res B] w z 1 CORR({(1, 2), (2, 3), (3, 4)}) 2 CORR({(1, 1), (2, 2)}) ---- So, there are 3 proposals that make sense to me: Option 1: Ban multi expression aggregates, leave decision to future working group. Advantage: easy, can get consensus on what is best to do in future. Common situation in SQL engines (MS SQL Server, MySQL, SQLite, ...). Disadvantage: no way to implement stats functions aggregates (for e.g.) within standard. COUNT(?x, ?y) equivalent becomes more verbose. Option 2: Stick with WD semantics, multi expression aggregates expand to a set of values, as Res A above. Advantage: makes things like COUNT(?x, ?y) easy, algebra is simple Disadvantage: rules out things like CORR, unless we specify expression ordering is preserved, even if we do that the semantics of them will be a little strange. What does CORR(?a, ?b , ?c) do? Option 3: Define (multi expression?) aggregates as producing a multiset of lists, as Res B above. Advantage: makes it easy to define stats aggregates in the future (I'm not proposing we do them in this round, it's a bit too much to bite off IMHO). Disadvantage: makes defn. of COUNT() etc. a bit more complex. Makes algebra a bit more complex. Questions around whether COUNT({(1, 2), (3, 4)}) = COUNT({1, 2, 3, 4}) etc. ---- My preference is probably Option 3, but I could live with Option 1. Option 2 is OK, just we have to accept that stats aggregates in the future will be a bit messy. - Steve -- Steve Harris, Garlik Limited 2 Sheen Road, Richmond, TW9 1AE, UK +44 20 8973 2465 http://www.garlik.com/ Registered in England and Wales 535 7233 VAT # 849 0517 11 Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Wednesday, 17 March 2010 17:45:07 UTC