- From: Daniel Garijo <dgarijo@delicias.dia.fi.upm.es>
- Date: Fri, 10 Aug 2012 20:58:50 +0200
- To: Curt Tilmes <Curt.Tilmes@nasa.gov>
- Cc: public-prov-wg@w3.org
- Message-ID: <CAExK0Dde82b9r1P1gs-LMyxuD4zgGF=i6bK9Err1JBcQnHu+rw@mail.gmail.com>
It looks good to me, +1 Daniel 2012/8/10 Curt Tilmes <Curt.Tilmes@nasa.gov> > I support the proposed design. > > Curt > > > On 8/10/12 9:14 AM, Luc Moreau wrote: > >> >> Jun, Stephen, Daniel, Curt, Sam, all, >> >> Given you voted against keeping the original design, or were supportive >> of another design, >> can you please confirm that, like Simon, you are fine with the proposed >> design. >> >> Thanks >> Luc >> >> >> On 10/08/12 13:48, Miles, Simon wrote: >> >>> Sure, fine with me. I just wanted to make sure I could explain the >>> meaning behind the proposals, as the constraints themselves are just >>> syntactic rules. >>> >>> Thanks, >>> Simon >>> >>> Dr Simon Miles >>> Senior Lecturer, Department of Informatics >>> Kings College London, WC2R 2LS, UK >>> +44 (0)20 7848 1166 >>> >>> Evolutionary Testing of Autonomous Software Agents: >>> http://eprints.dcs.kcl.ac.uk/**1370/<http://eprints.dcs.kcl.ac.uk/1370/> >>> ______________________________**__________ >>> From: James Cheney [jcheney@inf.ed.ac.uk] >>> Sent: 10 August 2012 12:43 >>> To: Miles, Simon; Luc Moreau >>> Cc: public-prov-wg@w3.org >>> Subject: Re: PROV-ISSUE-473 (generating-activity): Unique generation >>> events and activities [prov-dm-constraints] >>> >>> On Aug 10, 2012, at 11:45 AM, Luc Moreau wrote: >>> >>> Hi Simon, >>>> >>>> >>>> On 10/08/12 11:18, Miles, Simon wrote: >>>> >>>>> Hello, >>>>> >>>>> Sounds like a good solution to me, thanks. >>>>> >>>> Great. >>>> >>>> I interpret this as: a generation event identifier is unique to an >>>>> event-activity pair, but an entity can have multiple generation event >>>>> identifiers as long as these only refer to different descriptions of the >>>>> same event (same instant at which the same entity came into being). >>>>> >>>> Your interpretation is slightly different from the technical definition. >>>> Identifiers identify events and not descriptions. >>>> So, an entity can have multiple generation events. >>>> We require them to occur at the same instant. >>>> >>>> As a reminder, nowhere in prov-dm, we define a notion of event. >>>> Events are only defined in prov-constraints. >>>> >>>> To refine this, I'd slightly revise Simon's restatement as follows. >>> >>> "... a generation event identifier is unique to an event-activity pair, >>> but an entity can have multiple generation event identifiers as long as >>> these only refer to **simultaneous generation events** (all taking place at >>> the instant at which the entity came into being)." >>> >>> Simon, if that is also OK with you I will make the change and close. >>> Perhaps we should add a remark to this effect somewhere too, to ensure >>> that the intention is clear (replacing any text that suggests that the >>> generation event is unique to the entity.) >>> >>> --James >>> >>> Luc >>>> >>>> >>>> >>>> Thanks, >>>>> Simon >>>>> >>>>> Dr Simon Miles >>>>> Senior Lecturer, Department of Informatics >>>>> Kings College London, WC2R 2LS, UK >>>>> +44 (0)20 7848 1166 >>>>> >>>>> Evolutionary Testing of Autonomous Software Agents: >>>>> http://eprints.dcs.kcl.ac.uk/**1370/<http://eprints.dcs.kcl.ac.uk/1370/> >>>>> ______________________________**__________ >>>>> From: James Cheney [jcheney@inf.ed.ac.uk] >>>>> Sent: 10 August 2012 10:25 >>>>> To: Miles, Simon >>>>> Cc: Provenance Working Group >>>>> Subject: Re: PROV-ISSUE-473 (generating-activity): Unique generation >>>>> events and activities [prov-dm-constraints] >>>>> >>>>> Hi again, >>>>> >>>>> Luc, Tom and I had a brief phone call this morning to discuss this and >>>>> other issues before Luc goes on vacation. >>>>> >>>>> We came up with the following suggestion to address this issue: >>>>> >>>>> 0. Keep the key constraint on generation as-is. >>>>> >>>>> 1. Weaken generation-uniqueness as follows: >>>>> >>>>> IF wasGeneratedBy(gen1; e,a,_t1,_attrs1) and wasGeneratedBy(gen2; >>>>> e,a,_t2,_attrs2), >>>>> THEN gen1 = gen2. >>>>> >>>>> saying that each pair of entity and activity has at most one >>>>> generation event. >>>>> >>>>> 2. Add constraint generation-generation-**ordering: >>>>> >>>>> IF wasGeneratedBy(gen1;e,a1,t1,**attrs1) and >>>>> wasGeneratedBy(gen2;e,a2,t2,**attrs2) THEN id1 precedes id2 >>>>> >>>>> Then it would be possible to write >>>>> >>>>> wasGeneratedBy(ex:chart1, ex:compile, 2012-03-02T10:30:00) >>>>> wasGeneratedBy(ex:chart1, ex:illustrate, 2012-03-02T10:30:00) >>>>> >>>>> This would expand to >>>>> >>>>> wasGeneratedBy(gen1; ex:chart1, ex:compile, 2012-03-02T10:30:00,[]) >>>>> wasGeneratedBy(gen2; ex:chart1, ex:illustrate, 2012-03-02T10:30:00,[]) >>>>> >>>>> With the weaker form of generation-uniqueness, the two events are not >>>>> required to be equal / mergeable. However, they are required (by >>>>> generation-generation-**ordering) to be simultaneous: >>>>> >>>>> gen1 precedes gen2 precedes gen1 (which is allowed since precedes is >>>>> reflexive). >>>>> >>>>> If this is acceptable, we propose to resolve the issue by adopting the >>>>> above changes. For symmetry, we will adopt similar changes to >>>>> invalidation-uniqueness (allowing multiple activities to simultaneously >>>>> invalidate) and add an invalidation-invalidation-**ordering >>>>> constraint. >>>>> >>>>> --James >>>>> >>>>> On Aug 9, 2012, at 10:20 PM, James Cheney wrote: >>>>> >>>>> Hi, >>>>>> >>>>>> OK, the contributes attribute appproach was a strawman in any case. >>>>>> >>>>>> In a sense, the obstacle here is the focus in prov-n on relations that >>>>>> we think of as binary but in reality have an identity and additional >>>>>> parameters/attributes. In PROV-O, we would not have this problem, we >>>>>> could just link multiple activities to the event id. >>>>>> >>>>>> Your alternative essentially would (in database constraint terms) >>>>>> weaken >>>>>> the key constraint to a functional dependency that says that the event >>>>>> id determines only the time. This is fine and would be >>>>>> straightforward, >>>>>> if we were working with normal flat relations, BUT because of >>>>>> attribute >>>>>> lists (and the alignment we have in mind with rdf) it is not quite so >>>>>> easy. >>>>>> >>>>>> So concretely suppose we say: >>>>>> >>>>>> wgb(id;e,a1,t,[k1=v1]) >>>>>> wgb(id;e,a2,t,[k2=v2]) >>>>>> >>>>>> This is currently invalid. If we adopt your suggestion below, then >>>>>> it would be >>>>>> valid. But it's not clear to me what its normal form should be. >>>>>> Should the attributes of the first statement be merged into those of >>>>>> the second and vice versa? >>>>>> >>>>>> In rdf terms, we have been mapping the attribute value pairs to >>>>>> properties hanging off the id. So it seems to me that if we have >>>>>> attributes hanging off the same id in different places, they should be >>>>>> merged. In other words, it seems wrong to me to use the same id to >>>>>> describe two interactions, one between e and a1 and one between e and >>>>>> a2. >>>>>> >>>>>> So I think the right thing to do is somehow accommodate the fact that >>>>>> a generation event could involve multiple equal participants. If we >>>>>> had >>>>>> some lightweight way of collecting activities, so that we could in >>>>>> effect write >>>>>> >>>>>> wgb(id; e,[a1,a2], t, attrs) >>>>>> >>>>>> would that work? >>>>>> >>>>>> >>>>>> --James >>>>>> >>>>>> On Thu, 9 Aug 2012 19:45:36 +0100 >>>>>> "Miles, Simon" <simon.miles@kcl.ac.uk> wrote: >>>>>> >>>>>> Hello James, >>>>>>> >>>>>>> Agreed that simply removing the key constraint may allow too much. We >>>>>>> want to keep the defining aspects of an event fixed across all >>>>>>> descriptions to be valid. For an event in general, I think that just >>>>>>> means time of occurrence, correct? If so, can't we express this as >>>>>>> constraints, i.e. relation R with identifier i1 and time t1 cannot be >>>>>>> merged with R with identifier i2 and time t2 if R describes an event >>>>>>> (e.g. wasGeneratedBy) and i1=i2 but t1/=t2? For a generation event, >>>>>>> perhaps it also means the entity? >>>>>>> >>>>>>> For your use cases, I agree there is probably little difference in >>>>>>> practice. It's case 2 I was thinking about. I'm not sure if the >>>>>>> textual definitions in the DM preclude case 1, but it's interesting, >>>>>>> e.g. music is generated at the instant that each of the individual >>>>>>> instruments in a band are being played. >>>>>>> >>>>>>> I'd find "primary" activities and "contributed" properties hard to >>>>>>> explain and justify. I can't see why an activity at one level of >>>>>>> abstraction should be any more primary than one at another. I'm >>>>>>> unclear how to define contribution so that it works in both >>>>>>> directions (sub- to super-activity and vice-versa). Also, shouldn't >>>>>>> we be allowing for the merging of statements from multiple sources >>>>>>> when this produces a valid instance? If so, then we should allow for >>>>>>> two parties to declare a different generating event as primary, and >>>>>>> struggle to see why this means their statements should be >>>>>>> unmergeable. In conclusion, I'm not yet convinced by the idea. >>>>>>> >>>>>>> Thanks, >>>>>>> Simon >>>>>>> >>>>>>> Dr Simon Miles >>>>>>> Senior Lecturer, Department of Informatics >>>>>>> Kings College London, WC2R 2LS, UK >>>>>>> +44 (0)20 7848 1166 >>>>>>> >>>>>>> Evolutionary Testing of Autonomous Software Agents: >>>>>>> http://eprints.dcs.kcl.ac.uk/**1370/<http://eprints.dcs.kcl.ac.uk/1370/> >>>>>>> ______________________________**__________ >>>>>>> From: James Cheney [jcheney@inf.ed.ac.uk] >>>>>>> Sent: 09 August 2012 18:22 >>>>>>> To: Miles, Simon >>>>>>> Cc: Provenance Working Group >>>>>>> Subject: Re: PROV-ISSUE-473 (generating-activity): Unique generation >>>>>>> events and activities [prov-dm-constraints] >>>>>>> >>>>>>> OK. The problem with removing the key constraint is that it takes >>>>>>> away a lot more than we probably want, e.g. now we can say: >>>>>>> >>>>>>> wasGeneratedBy(evt; widget, worker1, Monday) >>>>>>> wasGeneratedBy(evt; widget, worker2, Tuesday). >>>>>>> wasGeneratedBy(evt; widget, factory, Friday). >>>>>>> >>>>>>> because (only) the key constraint says that all of the other fields >>>>>>> have to match (except attributes, which can be merged). >>>>>>> >>>>>>> That seems strange to me - the whole point of event identifiers (I >>>>>>> thought) is to identify the events. Most of what we have done >>>>>>> assumes events that take place between exactly two things (or at most >>>>>>> a small number), rather than arbitrarily many. So I would say that >>>>>>> at least the times should match, otherwise the thing gets generated >>>>>>> at two different times. >>>>>>> >>>>>>> It seems that there are two main use cases: >>>>>>> >>>>>>> 1. separate activities participating simultaneously in generating >>>>>>> the same entity: >>>>>>> >>>>>>> wasGeneratedBy(evt1;widget,**worker1,t1) >>>>>>> wasGeneratedBy(evt2;widget,**worker2,t1) >>>>>>> >>>>>>> 2. super- and sub-activities generating the same entity via events >>>>>>> describing different abstraction levels. >>>>>>> >>>>>>> wasGeneratedBy(evt1;widget, factory,t1) >>>>>>> wasGeneratedBy(evt2;widget, worker,t1) >>>>>>> (some non-PROV statement that a1 is part of a2) >>>>>>> >>>>>>> From the point of view of PROV, there is no real difference, since >>>>>>>> we don't have a way of saying an activity is a sub-activity of >>>>>>>> another... Does this sound right? >>>>>>>> >>>>>>> As a strawman, why wouldn't it work to require a specific "primary" >>>>>>> activity (which could be a new activity invented solely for this >>>>>>> event), and have an attribute that such as prov:contributedTo that >>>>>>> names other activities that contributed to a generation event >>>>>>> (perhaps indirectly, such as a super-activity)? >>>>>>> >>>>>>> Hence: >>>>>>> >>>>>>> wasGeneratedBy(evt1;e,**workers12,t1,[prov:contributed = worker1, >>>>>>> prov:contributed = worker2) >>>>>>> >>>>>>> wasGeneratedBy(evt1;e,worker,**t1,[prov:contributed = factory) >>>>>>> >>>>>>> --James >>>>>>> >>>>>>> On Aug 9, 2012, at 5:53 PM, Miles, Simon wrote: >>>>>>> >>>>>>> Hello James, >>>>>>>> >>>>>>>> I'm not clear what the invalidity point would actually look like or >>>>>>>> entail, so would prefer to reserve comment. >>>>>>>> >>>>>>>> Yes, happy to provide suggestions, examples, arguments etc. if you >>>>>>>> say what you need. I didn't have a particular solution in mind in >>>>>>>> the issue raised below, but agree with your suggestion in the >>>>>>>> telecon that it implies the removal of the key constraint on >>>>>>>> wasGeneratedBy. >>>>>>>> >>>>>>>> thanks, >>>>>>>> Simon >>>>>>>> >>>>>>>> Dr Simon Miles >>>>>>>> Senior Lecturer, Department of Informatics >>>>>>>> Kings College London, WC2R 2LS, UK >>>>>>>> +44 (0)20 7848 1166 >>>>>>>> >>>>>>>> Evolutionary Testing of Autonomous Software Agents: >>>>>>>> http://eprints.dcs.kcl.ac.uk/**1370/<http://eprints.dcs.kcl.ac.uk/1370/> >>>>>>>> ______________________________**__________ >>>>>>>> From: James Cheney [jcheney@inf.ed.ac.uk] >>>>>>>> Sent: 09 August 2012 17:23 >>>>>>>> To: Provenance Working Group >>>>>>>> Subject: Re: PROV-ISSUE-473 (generating-activity): Unique >>>>>>>> generation events and activities [prov-dm-constraints] >>>>>>>> >>>>>>>> The consensus was that this needs work, either by dropping some >>>>>>>> inferences (provided we understand the implications) or finding a >>>>>>>> way to accommodate multiple levels of abstraction. >>>>>>>> >>>>>>>> If we can find a way to allow the inference to be used to determine >>>>>>>> *invalidity* if implementations agree with it, while not requiring >>>>>>>> everyone use it, will that be OK? >>>>>>>> >>>>>>>> I will be pestering Simon, Daniel and Stian to offer suggestions >>>>>>>> and/or examples. >>>>>>>> >>>>>>>> --James >>>>>>>> >>>>>>>> >>>>>>>> On Aug 9, 2012, at 3:35 PM, Provenance Working Group Issue Tracker >>>>>>>> wrote: >>>>>>>> >>>>>>>> PROV-ISSUE-473 (generating-activity): Unique generation events and >>>>>>>>> activities [prov-dm-constraints] >>>>>>>>> >>>>>>>>> http://www.w3.org/2011/prov/**track/issues/473<http://www.w3.org/2011/prov/track/issues/473> >>>>>>>>> >>>>>>>>> Raised by: Simon Miles >>>>>>>>> On product: prov-dm-constraints >>>>>>>>> >>>>>>>>> As requested, I'm submitting an issue where I feel a >>>>>>>>> PROV-Constraints review comment of mine is not completely answered. >>>>>>>>> >>>>>>>>> My original comment: >>>>>>>>> >>>>>>>>>> Unique generations >>>>>>>>>> ----------- >>>>>>>>>> C. Immediately following Inference 12, the text says "the entity >>>>>>>>>> denoted by e2 is generated by at most one activity (see Constraint >>>>>>>>>> 27". The Remark below repeats this, "at most one activity could >>>>>>>>>> generate the entity e2." >>>>>>>>>> >>>>>>>>>> This seems wrong. Constraint 27 says that e2 is generated by only >>>>>>>>>> one generation event, not by only one activity. The distinction >>>>>>>>>> between these is important. In the primer's example, there is an >>>>>>>>>> activity ex:compile which is decomposed into steps ex:compose and >>>>>>>>>> ex:illustrate. While there is only one (implicit) generation >>>>>>>>>> event for entity ex:chart1, both ex:compile and ex:illustrate can >>>>>>>>>> be asserted to have generated the entity. >>>>>>>>>> >>>>>>>>> Response from editors: >>>>>>>>> >>>>>>>>>> Constraint 27 indeed says that there is a single generation event >>>>>>>>>> and constraint 26 says that the id is a key for a wasGeneratedBy >>>>>>>>>> which implies that there is a single activity. >>>>>>>>>> >>>>>>>>>> In the primer, you assert: >>>>>>>>>> wasGeneratedBy(ex:chart1, ex:compile, 2012-03-02T10:30:00) >>>>>>>>>> wasGeneratedBy(ex:chart1, ex:illustrate, 2012-03-02T10:30:00) >>>>>>>>>> >>>>>>>>>> This is invalid. >>>>>>>>>> >>>>>>>>>> One way to address this is to maintain two levels of abstraction >>>>>>>>>> for both activities and entities. >>>>>>>>>> >>>>>>>>>> wasGeneratedBy(ex:chart1_**abstract, ex:illustrate, >>>>>>>>>> 2012-03-02T10:30:00) >>>>>>>>>> specializationOf(ex:chart1,ex:**chart1_abstract) // or similar. >>>>>>>>>> >>>>>>>>> This response explains why the current constraints do not allow >>>>>>>>> what I described, but not why they are meaningful. The questions >>>>>>>>> below hopefully articulate my concerns. >>>>>>>>> >>>>>>>>> 1. The response suggests that the invalidity of the primer example >>>>>>>>> is due to it describing multiple levels of abstraction for a >>>>>>>>> single entity. Why should this be invalid? Why has validity got >>>>>>>>> anything to do with levels of abstraction? As far as I can see, >>>>>>>>> this is not stated or explained in PROV-Constraints. >>>>>>>>> >>>>>>>>> 2. As ex:chart1_abstract and ex:chart1 are exactly the same entity >>>>>>>>> with exactly the same attributes and generated at the same >>>>>>>>> instant, then why would we want statements implying one was more >>>>>>>>> abstract than the other? Isn't this at least misleading? >>>>>>>>> >>>>>>>>> I also have one related follow-on question: >>>>>>>>> >>>>>>>>> 3. Even if we do use the specialization approach to get around the >>>>>>>>> constraints as suggested, there can only be one entity per >>>>>>>>> generation event. If something is described at multiple levels of >>>>>>>>> abstraction, then does that necessitate a unique generation event >>>>>>>>> for each level (each entity)? If so (as appears), why? When I >>>>>>>>> create the first version of a document, in the same instant I >>>>>>>>> create both "doc" and "docV1". How do I describe that the event >>>>>>>>> creating one is the "same" event that created the other? It is >>>>>>>>> surely the "same" event in some strong, objective sense, even if >>>>>>>>> we prefer to describe it using a different identifier for each >>>>>>>>> entity. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Simon >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>> The University of Edinburgh is a charitable body, registered in >>>>>>>> Scotland, with registration number SC005336. >>>>>>>> >>>>>>>> -- >>>>>>> The University of Edinburgh is a charitable body, registered in >>>>>>> Scotland, with registration number SC005336. >>>>>>> >>>>>>> -- >>>>>> The University of Edinburgh is a charitable body, registered in >>>>>> Scotland, with registration number SC005336. >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>> The University of Edinburgh is a charitable body, registered in >>>>> Scotland, with registration number SC005336. >>>>> >>>> -- >>>> Professor Luc Moreau >>>> Electronics and Computer Science tel: +44 23 8059 4487 >>>> University of Southampton fax: +44 23 8059 2865 >>>> Southampton SO17 1BJ email: l.moreau@ecs.soton.ac.uk >>>> United Kingdom http://www.ecs.soton.ac.uk/~**lavm<http://www.ecs.soton.ac.uk/%7Elavm> >>>> >>>> >>>> >>>> >>> -- >>> The University of Edinburgh is a charitable body, registered in >>> Scotland, with registration number SC005336. >>> >>> >> -- >> Professor Luc Moreau >> Electronics and Computer Science tel: +44 23 8059 4487 >> University of Southampton fax: +44 23 8059 2865 >> Southampton SO17 1BJ email: l.moreau@ecs.soton.ac.uk >> United Kingdom http://www.ecs.soton.ac.uk/~**lavm<http://www.ecs.soton.ac.uk/%7Elavm> >> >> >> > > -- > Curt Tilmes, Ph.D. > U.S. Global Change Research Program > 1717 Pennsylvania Avenue NW, Suite 250 > Washington, D.C. 20006, USA > > +1 202-419-3479 (office) > +1 443-987-6228 (cell) > globalchange.gov > >
Received on Friday, 10 August 2012 18:59:20 UTC