W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > July to September 2009

RE: Initial draft of Design:Aggregate

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Tue, 25 Aug 2009 11:27:08 +0000
To: Ivan Herman <ivan@w3.org>, Chimezie Ogbuji <ogbujic@ccf.org>
CC: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <B6CF1054FDC8B845BF93A6645D19BEA3693CC7027C@GVW1118EXC.americas.hpqcorp.net>
Do we want to allow custom aggregates?  They'd like (syntax) exactly like function calls but the thing the URI names is an aggregator, not a function.  I'd like to include custom aggregates.


The issue about needing two PrimaryExpressions, one which allows aggregates and one that does not, is messy because the entire tree of expression rules needs changing if the PrimaryExpression can be different.  I took a shortcut in ARQ and have a general, single set of expression rules but have an internal falg to say if aggregates are allows.  Pro: the error message is cleaer; con: it's now context sensitive parsing and is not LL(1) or LALR(1).  It would be easy enough to macro generate the grammar with the two kinds of expression but the result will be verbose.

> restricted to variables that appear in the  AS clause? Ie, in the
> example you have, one could then say
> 
> HAVING (?totalPrice > 10)
> 
> I am just looking at some simpler scheme that would cover our main use
> cases, rather than completeness... 

Simpler would be good but I'm not sure restricting to SELECT-named variables is workable.  

Seems reasonable to have SELECT ?y .... GROUP BY ?y HAVING (count(*)>0) 
That is where the aggregate does not naturally occur in the SELECT.

ARQ generates a variable (the name is illegal SPARQL so no clash possible) if the HAVING mentions a aggregate so this isn't too bad.  Generated variables also occur because ARQ doesn't require a name for an expression in SELECT - the AS is optional and it will generate a name from the point-of-view of result sets.  The syntax is 

    (expression AS ?var)
    (expression)

that is, the ")" is after the ?var.  With expressions, the nuisance case is ?x-?y which without brackets or commas can be ?x-?y (one expression) or ?x  -?y  (two expressions).
 

	Andy

> -----Original Message-----
> From: Ivan Herman [mailto:ivan@w3.org]
> Sent: 25 August 2009 11:35
> To: Chimezie Ogbuji
> Cc: Seaborne, Andy; SPARQL Working Group
> Subject: Re: Initial draft of Design:Aggregate
> 
> I wonder... I realize that this 'HAVING' is powerful, but may lead to
> additional complications and may not be very easy for end users. Isn't
> it enough to define HAVING to have a similar syntax to FILTER, but
> restricted to variables that appear in the  AS clause? Ie, in the
> example you have, one could then say
> 
> HAVING (?totalPrice > 10)
> 
> I am just looking at some simpler scheme that would cover our main use
> cases, rather than completeness...
> 
> Ivan
> 
> Chimezie Ogbuji wrote:
> > Thanks Andy.
> >
> > I've extended the syntax to include a 'HAVING' expression.  I still
> need to
> > reconcile the functions in the draft formulization with the algebraic
> > operators in the previous section to determine the specific changes
> needed
> > to support the semantics of HAVING.
> >
> > The intuition, however, is for the Aggregation operator to take an
> > additional argument that is the boolean expression to use as a
> selection
> > criteria while performing the aggregation.  The expression will
> include
> > references to aggregate functions that will be evaluated in the same
> way as
> > the function(s) given in FuncAndVars, except their values aren't
> returned as
> > bindings but rather are used to evaluate the expressions to determine
> which
> > solution mappings are discarded
> >
> > One outstanding issue with the grammar changes is that the
> PrimaryExpression
> > used within HavingExpression would need to be different from the
> > PrimaryExpression used elsewhere insofar as references to aggregate
> > functions will need to be added to that part of the grammar.
> >
> > -- Chimezie
> >
> >
> > On 8/21/09 5:11 AM, "Seaborne, Andy" <andy.seaborne@hp.com> wrote:
> >
> >> I've added the definitions text to the wiki page.
> >>
> >> Andy
> >>
> >>> -----Original Message-----
> >>> From: Chimezie Ogbuji [mailto:ogbujic@ccf.org]
> >>> Sent: 11 August 2009 14:41
> >>> To: Seaborne, Andy; SPARQL Working Group
> >>> Subject: Re: Initial draft of Design:Aggregate
> >>>
> >>> Hey Andy
> >>>
> >>>
> >>> On 8/10/09 9:10 AM, "Seaborne, Andy" <andy.seaborne@hp.com> wrote:
> >>>>> http://www.w3.org/2009/sparql/wiki/Design:Aggregate

> >>>> Hi Chime,
> >>>> Yes - including LaTeX is hard. I think you can cut-and-paste the
> symbols
> >>> into
> >>>> the wiki text because it's all UTF-8 HTML but that's a bit fragile
> and only
> >>>> marginally better.
> >>> Okay, thanks.  I'll give that a try
> >>>
> >>>> I took tried to write out some formal definitions to check my
> >>> understanding:
> >>>> is this the sort of thing you had in mind?
> >>>>
> >>>>
> >>>> Single valued function: returns a solution projected down to named
> >>> variables
> >>>> only:
> >>>>   key(varlist, mu) = { (v,x) | (v,x) in mu, v in varlist }
> >>>>
> >>>> and the set of all keys:
> >>>>   key(varlist, Omega) = { k | mu in Omeag, k=key(varlist,mu) }
> >>>>
> >>>> The partition of the multiset Omega is:
> >>>>   Partition(varlist, Omega) = { (k,mu) | mu in Omega,
> k=key(varlist, mu) }
> >>> Yes, this needs to return a set from the multiset and varlist in
> order to
> >>> ensure uniqueness of the partitions.
> >>>
> >>>> Let agg(VarList,SubOmega) be the aggregation function run on a
> multiset of
> >>>> solutions taking variables VarList
> >>>>
> >>>> Aggregation(VarList, FuncAndVars, Omega, Mu) =
> >>>>    { merge(k, (Vout, agg(Vin,X)) | (k, X) in Partition(varlist,
> Omega),
> >>>> FuncAndVars=(f, Vin, Vout), agg in set f }
> >>> I believe we are on the same page.
> >>>
> >>>> Aside: we may need to restrict variable Vout so it does not clash
> with the
> >>>> key.
> >>> Yes.
> >>>
> >>>> One last point: do we want GROUP BY expressions, not just
> variables?  In
> >>> think
> >>>> we do; does create some issues about the projected down variables
> that are
> >>>> arguments to the partitioning expression.
> >>>> Some SQL systems put a random
> >>>> selection (AKAK first found?) but I prefer to not include the
> variables of
> >>> an
> >>>> expression at all so as to get ehy same results each time.
> >>>>
> >>>> Andy
> >>> Good question, I don't know off head (will need to chew on that).
> >>>
> >>> -- Chimezie
> >>>
> >>>
> >>> ===================================
> >>>
> >>> P Please consider the environment before printing this e-mail
> >>>
> >>> Cleveland Clinic is ranked one of the top hospitals
> >>> in America by U.S. News & World Report (2008).
> >>> Visit us online at http://www.clevelandclinic.org for
> >>> a complete listing of our services, staff and
> >>> locations.
> >>>
> >>>
> >>> Confidentiality Note:  This message is intended for use
> >>> only by the individual or entity to which it is addressed
> >>> and may contain information that is privileged,
> >>> confidential, and exempt from disclosure under applicable
> >>> law.  If the reader of this message is not the intended
> >>> recipient or the employee or agent responsible for
> >>> delivering the message to the intended recipient, you are
> >>> hereby notified that any dissemination, distribution or
> >>> copying of this communication is strictly prohibited.  If
> >>> you have received this communication in error,  please
> >>> contact the sender immediately and destroy the material in
> >>> its entirety, whether electronic or hard copy.  Thank you.
> >
> >
> > ===================================
> >
> > P Please consider the environment before printing this e-mail
> >
> > Cleveland Clinic is ranked one of the top hospitals
> > in America by U.S. News & World Report (2008).
> > Visit us online at http://www.clevelandclinic.org for
> > a complete listing of our services, staff and
> > locations.
> >
> >
> > Confidentiality Note:  This message is intended for use
> > only by the individual or entity to which it is addressed
> > and may contain information that is privileged,
> > confidential, and exempt from disclosure under applicable
> > law.  If the reader of this message is not the intended
> > recipient or the employee or agent responsible for
> > delivering the message to the intended recipient, you are
> > hereby notified that any dissemination, distribution or
> > copying of this communication is strictly prohibited.  If
> > you have received this communication in error,  please
> > contact the sender immediately and destroy the material in
> > its entirety, whether electronic or hard copy.  Thank you.
> >
> >
> 
> --
> 
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/

> mobile: +31-641044153
> PGP Key: http://www.ivan-herman.net/pgpkey.html

> FOAF: http://www.ivan-herman.net/foaf.rdf

Received on Tuesday, 25 August 2009 11:28:18 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:08:26 GMT