- From: Changhai Ke <cke@ilog.fr>
- Date: Tue, 14 Oct 2008 11:40:56 +0200
- To: <kifer@cs.sunysb.edu>, "Gary Hallmark" <gary.hallmark@oracle.com>
- Cc: <public-rif-wg@w3.org>
Hello, I don't know exactly the context under which the discussion about aggregates happens. It seems to me that it's too early to introduce aggregates to PRD, we still have simple things to define before we add aggregates. In the future, PRD should also add the aggregates, so they should be reusable. What do you think? In general, "groupBy" is an almost required feature for aggregates. It's a kind of must-have for the Event Stream Processing (ESP, domain of CEP). Maybe it's also interesting to add constructs like "top N" (order the collection by a criterion and take the first N elements). What do you think about the monoids (http://en.wikipedia.org/wiki/Monoid) to support the calculus of aggregates? Changhai -----Original Message----- From: public-rif-wg-request@w3.org [mailto:public-rif-wg-request@w3.org] On Behalf Of Michael Kifer Sent: lundi 13 octobre 2008 23:34 To: Gary Hallmark Cc: public-rif-wg@w3.org Subject: Re: [PRD/FLD] aggregates On Mon, 13 Oct 2008 12:03:34 -0700 Gary Hallmark <gary.hallmark@oracle.com> wrote: > Michael Kifer wrote: > > On Sun, 12 Oct 2008 17:25:50 -0700 > > Gary Hallmark <gary.hallmark@oracle.com> wrote: > > > > > >> Hi Michael, > >> > >> Thanks for getting the ball rolling. I think this should work for PRD, > >> but let me check that I understand your proposal. An example rule for > >> computing the average salary of employees grouped by department would be > >> something like: > >> > >> Forall ?deptno ?sal ?empId ( > >> AvgDeptSal(?deptno avg(?sal [ ?deptno ] | Emp(?empId ?deptno ?sal))) > >> ) > >> > > > > I am not sure how aggregates are supposed to be used in PRD, but: > > > > - in a logical rule-based language they would be in the body of a rule. > > > ditto for PRD. Aggregation is always in the condition/body/premise. > BTW, what do you propose for the model theory? Good. For the model theory, it can't be exactly the same for PRD and FLD because in PRD the aggregates are evaluated on a partial model. If there is recursion though aggregation then there would be a significant difference. But it is possible, I think, to make the definitions look similar. I need to think about some of the details, but the main issue is that aggregates require a bag semantics, while everything else is defined using sets. It is not exactly obvious how to define bag-comprehension (as opposed to the usual set-comprehension). If we want to define the bag of all X s.t. {X| phi(X)} then it is known how to do this if phi is an existentially quantified conjunction/disjunction. But it is less clear in general what to do if phi has foralls. (This is a problem for FLD, not PRD, as I understand.) So, some restrictions on phi will probably be required. > I think we want the same model theory for PRD and FLD conditions (but > not rulesets -- PRD has no model theory for rulesets) where they agree > on syntax. I suspect this should work fine for aggregation and naf. > > - the comprehension variable is not quantified > > (the aggregate works as a kind of quantifier for it) > > that sounds right for PRD as well. > > So, in such a language I would write something like > > > > Forall ?depno ?Avgsal ( > > Query(?depno ?Avgsal) :- > > ?Avgsal = avg(?sal [?deptno] | Exists ?empId (Emp(?empId ?deptno ?sal))) > > ) > > > > I suppose this can also be written as a fact like yours: > > > > Forall ?deptno ( > > AvgDeptSal(?deptno avg(?sal [?deptno] | Exists ?empId Emp(?empId ?deptno ?sal))) > > ) > > > > but I haven't thought about it. > > > I first wrote it like yours, then I saw it seemed I could "substitute" > and eliminate the ?Avgsal, so I just wanted to check if that was legal. > (It would probably be a bit easier for a PRD translator if it was not > legal, but its not a big deal either way) I think it might be easier for PRD to just keep aggregates in the body. FLD is more general so I need to think about it. > >> And if PRD doesn't support group by (I don't know of any PR engines that > >> do), we can simulate using > >> > >> Deptno(?deptno) :- Emp(?empId ?deptno ?sal) > >> AvgDeptSal(?deptno ?avgSal) :- And( Deptno(?deptno) ?avgSal = avg(?sal | > >> Emp(?empId ?deptno ?sal))) > >> > > > > Something like that. But you do not need to simulate anything. You just do not > > include the groupby variables in PRD. The syntax that I proposed is for FLD and > > the dialects that will extend BLD in the future. This is not even in BLD (or > > core). > > I meant to say that I like the notion of groupby and I regret that my PR > engine doesn't support it directly because it is used quite often (via > the "simulation"). I see. This is not an issue for RIF then. Why don't you propose that IBM adds this to its production rule engine? (Which soon might be JRules :-) michael
Received on Tuesday, 14 October 2008 09:42:15 UTC