- From: Steve Harris <steve.harris@garlik.com>
- Date: Wed, 17 Mar 2010 17:44:37 +0000
- To: "public-rdf-dawg@w3.org Group" <public-rdf-dawg@w3.org>
Hi all,
The Problem:
Some SQL implementations (at least Sybase, Postgres, Oracle) support
multi-expression aggregates, but not with the multiset semantics as in
the current working draft.
An example from Postgres is the CORR(a, b) aggregate, which can be
used like:
w x y
1 1 2
1 2 3
1 3 4
2 1 1
2 2 2
SELECT w, CORR(x, y) AS z FROM A GROUP BY w;
Following current SPARQL draft the equivalent:
SELECT ?w (CORR(?x, ?y) AS ?z) WHERE { ?w :x ?x ; :y ?y } GROUP BY ?w)
would evaluate as
[Res A]
w z
1 CORR({1, 2, 2, 3, 3, 4})
2 CORR({1, 1, 2, 2})
But Postgres etc. users will be expecting
[Res B]
w z
1 CORR({(1, 2), (2, 3), (3, 4)})
2 CORR({(1, 1), (2, 2)})
----
So, there are 3 proposals that make sense to me:
Option 1:
Ban multi expression aggregates, leave decision to future working group.
Advantage: easy, can get consensus on what is best to do in future.
Common situation in SQL engines (MS SQL Server, MySQL, SQLite, ...).
Disadvantage: no way to implement stats functions aggregates (for
e.g.) within standard. COUNT(?x, ?y) equivalent becomes more verbose.
Option 2:
Stick with WD semantics, multi expression aggregates expand to a set
of values, as Res A above.
Advantage: makes things like COUNT(?x, ?y) easy, algebra is simple
Disadvantage: rules out things like CORR, unless we specify expression
ordering is preserved, even if we do that the semantics of them will
be a little strange. What does CORR(?a, ?b , ?c) do?
Option 3:
Define (multi expression?) aggregates as producing a multiset of
lists, as Res B above.
Advantage: makes it easy to define stats aggregates in the future (I'm
not proposing we do them in this round, it's a bit too much to bite
off IMHO).
Disadvantage: makes defn. of COUNT() etc. a bit more complex. Makes
algebra a bit more complex. Questions around whether COUNT({(1, 2),
(3, 4)}) = COUNT({1, 2, 3, 4}) etc.
----
My preference is probably Option 3, but I could live with Option 1.
Option 2 is OK, just we have to accept that stats aggregates in the
future will be a bit messy.
- Steve
--
Steve Harris, Garlik Limited
2 Sheen Road, Richmond, TW9 1AE, UK
+44 20 8973 2465 http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10
9AD
Received on Wednesday, 17 March 2010 17:45:07 UTC