RE: Initial draft of Design:Aggregate from Seaborne, Andy on 2009-08-10 (public-rdf-dawg@w3.org from July to September 2009)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Mon, 10 Aug 2009 13:10:02 +0000
To: Chimezie Ogbuji <ogbujic@ccf.org>, SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <B6CF1054FDC8B845BF93A6645D19BEA3693C1E2A8C@GVW1118EXC.americas.hpqcorp.net>



> -----Original Message-----
> From: public-rdf-dawg-request@w3.org [mailto:public-rdf-dawg-request@w3.org]
> On Behalf Of Chimezie Ogbuji
> Sent: 08 August 2009 04:04
> To: SPARQL Working Group
> Subject: Initial draft of Design:Aggregate
> 
> I've finished a first draft of the Design:Aggregate wiki.  Some work on the
> Mapping to AST to algebra and test case sections is needed eventually, but I
> thought what I have so far should be enough for discussion on Tuesday.
> 
> I have a sketch for the aggregate operator semantics attached (I wasn't able
> to figure out the math syntax for our Mediawiki instance - any information
> on this would be useful)
> 
> http://www.w3.org/2009/sparql/wiki/Design:Aggregate


Hi Chime,

Yes - including LaTeX is hard. I think you can cut-and-paste the symbols into the wiki text because it's all UTF-8 HTML but that's a bit fragile and only marginally better.

I took tried to write out some formal definitions to check my understanding: is this the sort of thing you had in mind?


Single valued function: returns a solution projected down to named variables only:
  key(varlist, mu) = { (v,x) | (v,x) in mu, v in varlist }

and the set of all keys:
  key(varlist, Omega) = { k | mu in Omeag, k=key(varlist,mu) }

The partition of the multiset Omega is:
  Partition(varlist, Omega) = { (k,mu) | mu in Omega, k=key(varlist, mu) }

Let agg(VarList,SubOmega) be the aggregation function run on a multiset of solutions taking variables VarList

Aggregation(VarList, FuncAndVars, Omega, Mu) =
   { merge(k, (Vout, agg(Vin,X)) | (k, X) in Partition(varlist, Omega), FuncAndVars=(f, Vin, Vout), agg in set f }


Aside: we may need to restrict variable Vout so it does not clash with the key.

One last point: do we want GROUP BY expressions, not just variables?  In think we do; does create some issues about the projected down variables that are arguments to the partitioning expression.  Some SQL systems put a random selection (AKAK first found?) but I prefer to not include the variables of an expression at all so as to get ehy same results each time.

 Andy

Received on Monday, 10 August 2009 13:10:37 UTC