Re: [PRD/FLD] aggregates

On Mon, 13 Oct 2008 12:03:34 -0700
Gary Hallmark <gary.hallmark@oracle.com> wrote:

> Michael Kifer wrote:
> > On Sun, 12 Oct 2008 17:25:50 -0700
> > Gary Hallmark <gary.hallmark@oracle.com> wrote:
> >
> >   
> >> Hi Michael,
> >>
> >> Thanks for getting the ball rolling.  I think this should work for PRD, 
> >> but let me check that I understand your proposal.  An example rule for 
> >> computing the average salary of employees grouped by department would be 
> >> something like:
> >>
> >> Forall ?deptno ?sal ?empId (
> >>   AvgDeptSal(?deptno avg(?sal [ ?deptno ] | Emp(?empId ?deptno ?sal)))
> >> )
> >>     
> >
> > I am not sure how aggregates are supposed to be used in PRD, but:
> >
> >   - in a logical rule-based language they would be in the body of a rule.
> >   
> ditto for PRD.  Aggregation is always in the condition/body/premise.  
> BTW, what do you propose for the model theory? 

Good.

For the model theory, it can't be exactly the same for PRD and FLD because in
PRD the aggregates are evaluated on a partial model. If there is recursion
though aggregation then there would be a significant difference. But it is
possible, I think, to make the definitions look similar.

I need to think about some of the details, but the main issue is that
aggregates require a bag semantics, while everything else is defined using
sets. It is not exactly obvious how to define bag-comprehension (as opposed to
the usual set-comprehension). If we want to define the bag of all X s.t.
{X| phi(X)} then it is known how to do this if phi is an existentially
quantified conjunction/disjunction. But it is less clear in general what to do
if phi has foralls. (This is a problem for FLD, not PRD, as I understand.)
So, some restrictions on phi will probably be required.

> I think we want the same model theory for PRD and FLD conditions (but 
> not rulesets -- PRD has no model theory for rulesets) where they agree 
> on syntax.  I suspect this should work fine for aggregation and naf.
> >   - the comprehension variable is not quantified
> >     (the aggregate works as a kind of quantifier for it)
>
> that sounds right for PRD as well.
> > So, in such a language I would write something like
> >
> > Forall ?depno ?Avgsal (
> >   Query(?depno ?Avgsal) :-
> > 	?Avgsal = avg(?sal [?deptno] | Exists ?empId (Emp(?empId ?deptno ?sal)))
> > )
> >
> > I suppose this can also be written as a fact like yours:
> >
> > Forall ?deptno (
> >    AvgDeptSal(?deptno avg(?sal [?deptno] | Exists ?empId Emp(?empId ?deptno ?sal)))
> > )
> >
> > but I haven't thought about it.
> >   
> I first wrote it like yours, then I saw it seemed I could "substitute" 
> and eliminate the ?Avgsal, so I just wanted to check if that was legal. 
> (It would probably be a bit easier for a PRD translator if it was not 
> legal, but its not a big deal either way)

I think it might be easier for PRD to just keep aggregates in the body.
FLD is more general so I need to think about it.
  
> >> And if PRD doesn't support group by (I don't know of any PR engines that 
> >> do), we can simulate using
> >>
> >> Deptno(?deptno) :- Emp(?empId ?deptno ?sal)
> >> AvgDeptSal(?deptno ?avgSal) :- And( Deptno(?deptno) ?avgSal = avg(?sal | 
> >> Emp(?empId ?deptno ?sal)))
> >>     
> >
> > Something like that. But you do not need to simulate anything. You just do not
> > include the groupby variables in PRD. The syntax that I proposed is for FLD and
> > the dialects that will extend BLD in the future. This is not even in BLD (or
> > core).
>
> I meant to say that I like the notion of groupby and I regret that my PR 
> engine doesn't support it directly because it is used quite often (via 
> the "simulation").


I see. This is not an issue for RIF then. Why don't you propose that IBM
adds this to its production rule engine? (Which soon might be JRules :-)


michael

Received on Monday, 13 October 2008 21:42:28 UTC