Re: [PRD/FLD] aggregates from Michael Kifer on 2008-10-14 (public-rif-wg@w3.org from October 2008)

From: Michael Kifer <kifer@cs.sunysb.edu>
Date: Tue, 14 Oct 2008 14:27:33 -0400
To: "Changhai Ke" <cke@ilog.fr>
Cc: "Gary Hallmark" <gary.hallmark@oracle.com>, <public-rif-wg@w3.org>
Message-ID: <20081014142733.73d2571a@kiferdesk>
Right now this is a coordination effort. Aggregates are going to be added to
FLD, so I want to make sure that this is compatible with PRD's future addition
of these things.

michael

On Tue, 14 Oct 2008 11:40:56 +0200
"Changhai Ke" <cke@ilog.fr> wrote:

> Hello,
> 
> I don't know exactly the context under which the discussion about
> aggregates happens. It seems to me that it's too early to introduce
> aggregates to PRD, we still have simple things to define before we add
> aggregates. In the future, PRD should also add the aggregates, so they
> should be reusable. What do you think?
> 
> In general, "groupBy" is an almost required feature for aggregates. It's
> a kind of must-have for the Event Stream Processing (ESP, domain of
> CEP). Maybe it's also interesting to add constructs like "top N" (order
> the collection by a criterion and take the first N elements).
> 
> What do you think about the monoids
> (http://en.wikipedia.org/wiki/Monoid) to support the calculus of
> aggregates?
> 
> Changhai
> 
> -----Original Message-----
> From: public-rif-wg-request@w3.org [mailto:public-rif-wg-request@w3.org]
> On Behalf Of Michael Kifer
> Sent: lundi 13 octobre 2008 23:34
> To: Gary Hallmark
> Cc: public-rif-wg@w3.org
> Subject: Re: [PRD/FLD] aggregates
> 
> 
> 
> 
> On Mon, 13 Oct 2008 12:03:34 -0700
> Gary Hallmark <gary.hallmark@oracle.com> wrote:
> 
> > Michael Kifer wrote:
> > > On Sun, 12 Oct 2008 17:25:50 -0700
> > > Gary Hallmark <gary.hallmark@oracle.com> wrote:
> > >
> > >   
> > >> Hi Michael,
> > >>
> > >> Thanks for getting the ball rolling.  I think this should work for
> PRD, 
> > >> but let me check that I understand your proposal.  An example rule
> for 
> > >> computing the average salary of employees grouped by department
> would be 
> > >> something like:
> > >>
> > >> Forall ?deptno ?sal ?empId (
> > >>   AvgDeptSal(?deptno avg(?sal [ ?deptno ] | Emp(?empId ?deptno
> ?sal)))
> > >> )
> > >>     
> > >
> > > I am not sure how aggregates are supposed to be used in PRD, but:
> > >
> > >   - in a logical rule-based language they would be in the body of a
> rule.
> > >   
> > ditto for PRD.  Aggregation is always in the condition/body/premise.  
> > BTW, what do you propose for the model theory? 
> 
> Good.
> 
> For the model theory, it can't be exactly the same for PRD and FLD
> because in
> PRD the aggregates are evaluated on a partial model. If there is
> recursion
> though aggregation then there would be a significant difference. But it
> is
> possible, I think, to make the definitions look similar.
> 
> I need to think about some of the details, but the main issue is that
> aggregates require a bag semantics, while everything else is defined
> using
> sets. It is not exactly obvious how to define bag-comprehension (as
> opposed to
> the usual set-comprehension). If we want to define the bag of all X s.t.
> {X| phi(X)} then it is known how to do this if phi is an existentially
> quantified conjunction/disjunction. But it is less clear in general what
> to do
> if phi has foralls. (This is a problem for FLD, not PRD, as I
> understand.)
> So, some restrictions on phi will probably be required.
> 
> > I think we want the same model theory for PRD and FLD conditions (but 
> > not rulesets -- PRD has no model theory for rulesets) where they agree
> 
> > on syntax.  I suspect this should work fine for aggregation and naf.
> > >   - the comprehension variable is not quantified
> > >     (the aggregate works as a kind of quantifier for it)
> >
> > that sounds right for PRD as well.
> > > So, in such a language I would write something like
> > >
> > > Forall ?depno ?Avgsal (
> > >   Query(?depno ?Avgsal) :-
> > > 	?Avgsal = avg(?sal [?deptno] | Exists ?empId (Emp(?empId ?deptno
> ?sal)))
> > > )
> > >
> > > I suppose this can also be written as a fact like yours:
> > >
> > > Forall ?deptno (
> > >    AvgDeptSal(?deptno avg(?sal [?deptno] | Exists ?empId Emp(?empId
> ?deptno ?sal)))
> > > )
> > >
> > > but I haven't thought about it.
> > >   
> > I first wrote it like yours, then I saw it seemed I could "substitute"
> 
> > and eliminate the ?Avgsal, so I just wanted to check if that was
> legal. 
> > (It would probably be a bit easier for a PRD translator if it was not 
> > legal, but its not a big deal either way)
> 
> I think it might be easier for PRD to just keep aggregates in the body.
> FLD is more general so I need to think about it.
>   
> > >> And if PRD doesn't support group by (I don't know of any PR engines
> that 
> > >> do), we can simulate using
> > >>
> > >> Deptno(?deptno) :- Emp(?empId ?deptno ?sal)
> > >> AvgDeptSal(?deptno ?avgSal) :- And( Deptno(?deptno) ?avgSal =
> avg(?sal | 
> > >> Emp(?empId ?deptno ?sal)))
> > >>     
> > >
> > > Something like that. But you do not need to simulate anything. You
> just do not
> > > include the groupby variables in PRD. The syntax that I proposed is
> for FLD and
> > > the dialects that will extend BLD in the future. This is not even in
> BLD (or
> > > core).
> >
> > I meant to say that I like the notion of groupby and I regret that my
> PR 
> > engine doesn't support it directly because it is used quite often (via
> 
> > the "simulation").
> 
> 
> I see. This is not an issue for RIF then. Why don't you propose that IBM
> adds this to its production rule engine? (Which soon might be JRules :-)
> 
> 
> michael
> 
> 
>
Received on Tuesday, 14 October 2008 18:28:54 UTC