RE: [PRD/FLD] aggregates from Changhai Ke on 2008-10-14 (public-rif-wg@w3.org from October 2008)

From: Changhai Ke <cke@ilog.fr>
Date: Tue, 14 Oct 2008 11:40:56 +0200
To: <kifer@cs.sunysb.edu>, "Gary Hallmark" <gary.hallmark@oracle.com>
Cc: <public-rif-wg@w3.org>
Message-ID: <3E5E1A634BBD5C4A94C4D4A6DE0852E701A5D762@parmbx02.ilog.biz>
Hello,

I don't know exactly the context under which the discussion about
aggregates happens. It seems to me that it's too early to introduce
aggregates to PRD, we still have simple things to define before we add
aggregates. In the future, PRD should also add the aggregates, so they
should be reusable. What do you think?

In general, "groupBy" is an almost required feature for aggregates. It's
a kind of must-have for the Event Stream Processing (ESP, domain of
CEP). Maybe it's also interesting to add constructs like "top N" (order
the collection by a criterion and take the first N elements).

What do you think about the monoids
(http://en.wikipedia.org/wiki/Monoid) to support the calculus of
aggregates?

Changhai

-----Original Message-----
From: public-rif-wg-request@w3.org [mailto:public-rif-wg-request@w3.org]
On Behalf Of Michael Kifer
Sent: lundi 13 octobre 2008 23:34
To: Gary Hallmark
Cc: public-rif-wg@w3.org
Subject: Re: [PRD/FLD] aggregates




On Mon, 13 Oct 2008 12:03:34 -0700
Gary Hallmark <gary.hallmark@oracle.com> wrote:

> Michael Kifer wrote:
> > On Sun, 12 Oct 2008 17:25:50 -0700
> > Gary Hallmark <gary.hallmark@oracle.com> wrote:
> >
> >   
> >> Hi Michael,
> >>
> >> Thanks for getting the ball rolling.  I think this should work for
PRD, 
> >> but let me check that I understand your proposal.  An example rule
for 
> >> computing the average salary of employees grouped by department
would be 
> >> something like:
> >>
> >> Forall ?deptno ?sal ?empId (
> >>   AvgDeptSal(?deptno avg(?sal [ ?deptno ] | Emp(?empId ?deptno
?sal)))
> >> )
> >>     
> >
> > I am not sure how aggregates are supposed to be used in PRD, but:
> >
> >   - in a logical rule-based language they would be in the body of a
rule.
> >   
> ditto for PRD.  Aggregation is always in the condition/body/premise.  
> BTW, what do you propose for the model theory? 

Good.

For the model theory, it can't be exactly the same for PRD and FLD
because in
PRD the aggregates are evaluated on a partial model. If there is
recursion
though aggregation then there would be a significant difference. But it
is
possible, I think, to make the definitions look similar.

I need to think about some of the details, but the main issue is that
aggregates require a bag semantics, while everything else is defined
using
sets. It is not exactly obvious how to define bag-comprehension (as
opposed to
the usual set-comprehension). If we want to define the bag of all X s.t.
{X| phi(X)} then it is known how to do this if phi is an existentially
quantified conjunction/disjunction. But it is less clear in general what
to do
if phi has foralls. (This is a problem for FLD, not PRD, as I
understand.)
So, some restrictions on phi will probably be required.

> I think we want the same model theory for PRD and FLD conditions (but 
> not rulesets -- PRD has no model theory for rulesets) where they agree

> on syntax.  I suspect this should work fine for aggregation and naf.
> >   - the comprehension variable is not quantified
> >     (the aggregate works as a kind of quantifier for it)
>
> that sounds right for PRD as well.
> > So, in such a language I would write something like
> >
> > Forall ?depno ?Avgsal (
> >   Query(?depno ?Avgsal) :-
> > 	?Avgsal = avg(?sal [?deptno] | Exists ?empId (Emp(?empId ?deptno
?sal)))
> > )
> >
> > I suppose this can also be written as a fact like yours:
> >
> > Forall ?deptno (
> >    AvgDeptSal(?deptno avg(?sal [?deptno] | Exists ?empId Emp(?empId
?deptno ?sal)))
> > )
> >
> > but I haven't thought about it.
> >   
> I first wrote it like yours, then I saw it seemed I could "substitute"

> and eliminate the ?Avgsal, so I just wanted to check if that was
legal. 
> (It would probably be a bit easier for a PRD translator if it was not 
> legal, but its not a big deal either way)

I think it might be easier for PRD to just keep aggregates in the body.
FLD is more general so I need to think about it.
  
> >> And if PRD doesn't support group by (I don't know of any PR engines
that 
> >> do), we can simulate using
> >>
> >> Deptno(?deptno) :- Emp(?empId ?deptno ?sal)
> >> AvgDeptSal(?deptno ?avgSal) :- And( Deptno(?deptno) ?avgSal =
avg(?sal | 
> >> Emp(?empId ?deptno ?sal)))
> >>     
> >
> > Something like that. But you do not need to simulate anything. You
just do not
> > include the groupby variables in PRD. The syntax that I proposed is
for FLD and
> > the dialects that will extend BLD in the future. This is not even in
BLD (or
> > core).
>
> I meant to say that I like the notion of groupby and I regret that my
PR 
> engine doesn't support it directly because it is used quite often (via

> the "simulation").


I see. This is not an issue for RIF then. Why don't you propose that IBM
adds this to its production rule engine? (Which soon might be JRules :-)


michael
Received on Tuesday, 14 October 2008 09:42:15 UTC