Re: PROV-ISSUE-473 (generating-activity): Unique generation events and activities [prov-dm-constraints]

It looks good to me, +1
Daniel

2012/8/10 Curt Tilmes <Curt.Tilmes@nasa.gov>

> I support the proposed design.
>
> Curt
>
>
> On 8/10/12 9:14 AM, Luc Moreau wrote:
>
>>
>> Jun, Stephen, Daniel, Curt, Sam, all,
>>
>> Given you voted against keeping the original design, or were supportive
>> of another design,
>> can you please confirm that, like Simon, you are fine with the proposed
>> design.
>>
>> Thanks
>> Luc
>>
>>
>> On 10/08/12 13:48, Miles, Simon wrote:
>>
>>> Sure, fine with me. I just wanted to make sure I could explain the
>>> meaning behind the proposals, as the constraints themselves are just
>>> syntactic rules.
>>>
>>> Thanks,
>>> Simon
>>>
>>> Dr Simon Miles
>>> Senior Lecturer, Department of Informatics
>>> Kings College London, WC2R 2LS, UK
>>> +44 (0)20 7848 1166
>>>
>>> Evolutionary Testing of Autonomous Software Agents:
>>> http://eprints.dcs.kcl.ac.uk/**1370/<http://eprints.dcs.kcl.ac.uk/1370/>
>>> ______________________________**__________
>>> From: James Cheney [jcheney@inf.ed.ac.uk]
>>> Sent: 10 August 2012 12:43
>>> To: Miles, Simon; Luc Moreau
>>> Cc: public-prov-wg@w3.org
>>> Subject: Re: PROV-ISSUE-473 (generating-activity): Unique generation
>>> events      and activities [prov-dm-constraints]
>>>
>>> On Aug 10, 2012, at 11:45 AM, Luc Moreau wrote:
>>>
>>>  Hi Simon,
>>>>
>>>>
>>>> On 10/08/12 11:18, Miles, Simon wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> Sounds like a good solution to me, thanks.
>>>>>
>>>> Great.
>>>>
>>>>  I interpret this as: a generation event identifier is unique to an
>>>>> event-activity pair, but an entity can have multiple generation event
>>>>> identifiers as long as these only refer to different descriptions of the
>>>>> same event (same instant at which the same entity came into being).
>>>>>
>>>> Your interpretation is slightly different from the technical definition.
>>>> Identifiers identify events and not descriptions.
>>>> So, an entity can have multiple generation events.
>>>> We require them to occur at the same instant.
>>>>
>>>> As a reminder, nowhere in prov-dm, we define a notion of event.
>>>> Events are only defined in prov-constraints.
>>>>
>>>>  To refine this, I'd slightly revise Simon's restatement as follows.
>>>
>>> "... a generation event identifier is unique to an event-activity pair,
>>> but an entity can have multiple generation event identifiers as long as
>>> these only refer to **simultaneous generation events** (all taking place at
>>> the instant at which the entity came into being)."
>>>
>>> Simon, if that is also OK with you I will make the change and close.
>>>  Perhaps we should add a remark to this effect somewhere too, to ensure
>>> that the intention is clear (replacing any text that suggests that the
>>> generation event is unique to the entity.)
>>>
>>> --James
>>>
>>>  Luc
>>>>
>>>>
>>>>
>>>>  Thanks,
>>>>> Simon
>>>>>
>>>>> Dr Simon Miles
>>>>> Senior Lecturer, Department of Informatics
>>>>> Kings College London, WC2R 2LS, UK
>>>>> +44 (0)20 7848 1166
>>>>>
>>>>> Evolutionary Testing of Autonomous Software Agents:
>>>>> http://eprints.dcs.kcl.ac.uk/**1370/<http://eprints.dcs.kcl.ac.uk/1370/>
>>>>> ______________________________**__________
>>>>> From: James Cheney [jcheney@inf.ed.ac.uk]
>>>>> Sent: 10 August 2012 10:25
>>>>> To: Miles, Simon
>>>>> Cc: Provenance Working Group
>>>>> Subject: Re: PROV-ISSUE-473 (generating-activity): Unique generation
>>>>> events    and activities [prov-dm-constraints]
>>>>>
>>>>> Hi again,
>>>>>
>>>>> Luc, Tom and I had a brief phone call this morning to discuss this and
>>>>> other issues before Luc goes on vacation.
>>>>>
>>>>> We came up with the following suggestion to address this issue:
>>>>>
>>>>> 0.  Keep the key constraint on generation as-is.
>>>>>
>>>>> 1.  Weaken generation-uniqueness as follows:
>>>>>
>>>>> IF wasGeneratedBy(gen1; e,a,_t1,_attrs1) and wasGeneratedBy(gen2;
>>>>> e,a,_t2,_attrs2),
>>>>> THEN gen1 = gen2.
>>>>>
>>>>> saying that each pair of entity and activity has at most one
>>>>> generation event.
>>>>>
>>>>> 2.  Add constraint generation-generation-**ordering:
>>>>>
>>>>> IF wasGeneratedBy(gen1;e,a1,t1,**attrs1) and
>>>>> wasGeneratedBy(gen2;e,a2,t2,**attrs2) THEN id1 precedes id2
>>>>>
>>>>> Then it would be possible to write
>>>>>
>>>>> wasGeneratedBy(ex:chart1, ex:compile,  2012-03-02T10:30:00)
>>>>> wasGeneratedBy(ex:chart1, ex:illustrate,  2012-03-02T10:30:00)
>>>>>
>>>>> This would expand to
>>>>>
>>>>> wasGeneratedBy(gen1; ex:chart1, ex:compile,  2012-03-02T10:30:00,[])
>>>>> wasGeneratedBy(gen2; ex:chart1, ex:illustrate,  2012-03-02T10:30:00,[])
>>>>>
>>>>> With the weaker form of generation-uniqueness, the two events are not
>>>>> required to be equal / mergeable.  However, they are required (by
>>>>> generation-generation-**ordering) to be simultaneous:
>>>>>
>>>>> gen1 precedes gen2 precedes gen1 (which is allowed since precedes is
>>>>> reflexive).
>>>>>
>>>>> If this is acceptable, we propose to resolve the issue by adopting the
>>>>> above changes.  For symmetry, we will adopt similar changes to
>>>>> invalidation-uniqueness (allowing multiple activities to simultaneously
>>>>> invalidate) and add an invalidation-invalidation-**ordering
>>>>> constraint.
>>>>>
>>>>> --James
>>>>>
>>>>> On Aug 9, 2012, at 10:20 PM, James Cheney wrote:
>>>>>
>>>>>  Hi,
>>>>>>
>>>>>> OK, the contributes attribute appproach was a strawman in any case.
>>>>>>
>>>>>> In a sense, the obstacle here is the focus in prov-n on relations that
>>>>>> we think of as binary but in reality have an identity and additional
>>>>>> parameters/attributes.  In PROV-O, we would not have this problem, we
>>>>>> could just link multiple activities to the event id.
>>>>>>
>>>>>> Your alternative essentially would (in database constraint terms)
>>>>>> weaken
>>>>>> the key constraint to a functional dependency that says that the event
>>>>>> id determines only the time. This is fine and would be
>>>>>> straightforward,
>>>>>> if we were working with normal flat relations, BUT because of
>>>>>> attribute
>>>>>> lists (and the alignment we have in mind with rdf) it is not quite so
>>>>>> easy.
>>>>>>
>>>>>> So concretely suppose we say:
>>>>>>
>>>>>> wgb(id;e,a1,t,[k1=v1])
>>>>>> wgb(id;e,a2,t,[k2=v2])
>>>>>>
>>>>>> This is currently invalid.  If we adopt your suggestion below, then
>>>>>> it would be
>>>>>> valid.  But it's not clear to me what its normal form should be.
>>>>>> Should the attributes of the first statement be merged into those of
>>>>>> the second and vice versa?
>>>>>>
>>>>>> In rdf terms, we have been mapping the attribute value pairs to
>>>>>> properties hanging off the id.  So it seems to me that if we have
>>>>>> attributes hanging off the same id in different places, they should be
>>>>>> merged.  In other words, it seems wrong to me to use the same id to
>>>>>> describe two interactions, one between e and a1 and one between e and
>>>>>> a2.
>>>>>>
>>>>>> So I think the right thing to do is somehow accommodate the fact that
>>>>>> a generation event could involve multiple equal participants.  If we
>>>>>> had
>>>>>> some lightweight way of collecting activities, so that we could in
>>>>>> effect write
>>>>>>
>>>>>> wgb(id; e,[a1,a2], t, attrs)
>>>>>>
>>>>>> would that work?
>>>>>>
>>>>>>
>>>>>> --James
>>>>>>
>>>>>> On Thu, 9 Aug 2012 19:45:36 +0100
>>>>>> "Miles, Simon" <simon.miles@kcl.ac.uk> wrote:
>>>>>>
>>>>>>  Hello James,
>>>>>>>
>>>>>>> Agreed that simply removing the key constraint may allow too much. We
>>>>>>> want to keep the defining aspects of an event fixed across all
>>>>>>> descriptions to be valid. For an event in general, I think that just
>>>>>>> means time of occurrence, correct? If so, can't we express this as
>>>>>>> constraints, i.e. relation R with identifier i1 and time t1 cannot be
>>>>>>> merged with R with identifier i2 and time t2 if R describes an event
>>>>>>> (e.g. wasGeneratedBy) and i1=i2 but t1/=t2? For a generation event,
>>>>>>> perhaps it also means the entity?
>>>>>>>
>>>>>>> For your use cases, I agree there is probably little difference in
>>>>>>> practice. It's case 2 I was thinking about. I'm not sure if the
>>>>>>> textual definitions in the DM preclude case 1, but it's interesting,
>>>>>>> e.g. music is generated at the instant that each of the individual
>>>>>>> instruments in a band are being played.
>>>>>>>
>>>>>>> I'd find "primary" activities and "contributed" properties hard to
>>>>>>> explain and justify. I can't see why an activity at one level of
>>>>>>> abstraction should be any more primary than one at another. I'm
>>>>>>> unclear how to define contribution so that it works in both
>>>>>>> directions (sub- to super-activity and vice-versa). Also, shouldn't
>>>>>>> we be allowing for the merging of statements from multiple sources
>>>>>>> when this produces a valid instance? If so, then we should allow for
>>>>>>> two parties to declare a different generating event as primary, and
>>>>>>> struggle to see why this means their statements should be
>>>>>>> unmergeable. In conclusion, I'm not yet convinced by the idea.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Simon
>>>>>>>
>>>>>>> Dr Simon Miles
>>>>>>> Senior Lecturer, Department of Informatics
>>>>>>> Kings College London, WC2R 2LS, UK
>>>>>>> +44 (0)20 7848 1166
>>>>>>>
>>>>>>> Evolutionary Testing of Autonomous Software Agents:
>>>>>>> http://eprints.dcs.kcl.ac.uk/**1370/<http://eprints.dcs.kcl.ac.uk/1370/>
>>>>>>> ______________________________**__________
>>>>>>> From: James Cheney [jcheney@inf.ed.ac.uk]
>>>>>>> Sent: 09 August 2012 18:22
>>>>>>> To: Miles, Simon
>>>>>>> Cc: Provenance Working Group
>>>>>>> Subject: Re: PROV-ISSUE-473 (generating-activity): Unique generation
>>>>>>> events  and activities [prov-dm-constraints]
>>>>>>>
>>>>>>> OK.  The problem with removing the key constraint is that it takes
>>>>>>> away a lot more than we probably want, e.g. now we can say:
>>>>>>>
>>>>>>> wasGeneratedBy(evt; widget, worker1, Monday)
>>>>>>> wasGeneratedBy(evt; widget, worker2, Tuesday).
>>>>>>> wasGeneratedBy(evt; widget, factory, Friday).
>>>>>>>
>>>>>>> because (only) the key constraint says that all of the other fields
>>>>>>> have to match (except attributes, which can be merged).
>>>>>>>
>>>>>>> That seems strange to me - the whole point of event identifiers (I
>>>>>>> thought) is to identify the events.  Most of what we have done
>>>>>>> assumes events that take place between exactly two things (or at most
>>>>>>> a small number), rather than arbitrarily many.  So I would say that
>>>>>>> at least the times should match, otherwise the thing gets generated
>>>>>>> at two different times.
>>>>>>>
>>>>>>> It seems that there are two main use cases:
>>>>>>>
>>>>>>> 1.  separate activities participating simultaneously in generating
>>>>>>> the same entity:
>>>>>>>
>>>>>>> wasGeneratedBy(evt1;widget,**worker1,t1)
>>>>>>> wasGeneratedBy(evt2;widget,**worker2,t1)
>>>>>>>
>>>>>>> 2.  super- and sub-activities generating the same entity via events
>>>>>>> describing different abstraction levels.
>>>>>>>
>>>>>>> wasGeneratedBy(evt1;widget, factory,t1)
>>>>>>> wasGeneratedBy(evt2;widget, worker,t1)
>>>>>>> (some non-PROV statement that a1 is part of a2)
>>>>>>>
>>>>>>>    From the point of view of PROV, there is no real difference, since
>>>>>>>> we don't have a way of saying an activity is a sub-activity of
>>>>>>>> another... Does this sound right?
>>>>>>>>
>>>>>>> As a strawman, why wouldn't it work to require a specific "primary"
>>>>>>> activity (which could be a new activity invented solely for this
>>>>>>> event), and have an attribute that such as prov:contributedTo that
>>>>>>> names other activities that contributed to a generation event
>>>>>>> (perhaps indirectly, such as a super-activity)?
>>>>>>>
>>>>>>> Hence:
>>>>>>>
>>>>>>> wasGeneratedBy(evt1;e,**workers12,t1,[prov:contributed = worker1,
>>>>>>> prov:contributed = worker2)
>>>>>>>
>>>>>>> wasGeneratedBy(evt1;e,worker,**t1,[prov:contributed = factory)
>>>>>>>
>>>>>>> --James
>>>>>>>
>>>>>>> On Aug 9, 2012, at 5:53 PM, Miles, Simon wrote:
>>>>>>>
>>>>>>>  Hello James,
>>>>>>>>
>>>>>>>> I'm not clear what the invalidity point would actually look like or
>>>>>>>> entail, so would prefer to reserve comment.
>>>>>>>>
>>>>>>>> Yes, happy to provide suggestions, examples, arguments etc. if you
>>>>>>>> say what you need. I didn't have a particular solution in mind in
>>>>>>>> the issue raised below, but agree with your suggestion in the
>>>>>>>> telecon that it implies the removal of the key constraint on
>>>>>>>> wasGeneratedBy.
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>> Simon
>>>>>>>>
>>>>>>>> Dr Simon Miles
>>>>>>>> Senior Lecturer, Department of Informatics
>>>>>>>> Kings College London, WC2R 2LS, UK
>>>>>>>> +44 (0)20 7848 1166
>>>>>>>>
>>>>>>>> Evolutionary Testing of Autonomous Software Agents:
>>>>>>>> http://eprints.dcs.kcl.ac.uk/**1370/<http://eprints.dcs.kcl.ac.uk/1370/>
>>>>>>>> ______________________________**__________
>>>>>>>> From: James Cheney [jcheney@inf.ed.ac.uk]
>>>>>>>> Sent: 09 August 2012 17:23
>>>>>>>> To: Provenance Working Group
>>>>>>>> Subject: Re: PROV-ISSUE-473 (generating-activity): Unique
>>>>>>>> generation events and activities [prov-dm-constraints]
>>>>>>>>
>>>>>>>> The consensus was that this needs work, either by dropping some
>>>>>>>> inferences (provided we understand the implications) or finding a
>>>>>>>> way to accommodate multiple levels of abstraction.
>>>>>>>>
>>>>>>>> If we can find a way to allow the inference to be used to determine
>>>>>>>> *invalidity* if implementations agree with it, while not requiring
>>>>>>>> everyone use it, will that be OK?
>>>>>>>>
>>>>>>>> I will be pestering Simon, Daniel and Stian to offer suggestions
>>>>>>>> and/or examples.
>>>>>>>>
>>>>>>>> --James
>>>>>>>>
>>>>>>>>
>>>>>>>> On Aug 9, 2012, at 3:35 PM, Provenance Working Group Issue Tracker
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>  PROV-ISSUE-473 (generating-activity): Unique generation events and
>>>>>>>>> activities [prov-dm-constraints]
>>>>>>>>>
>>>>>>>>> http://www.w3.org/2011/prov/**track/issues/473<http://www.w3.org/2011/prov/track/issues/473>
>>>>>>>>>
>>>>>>>>> Raised by: Simon Miles
>>>>>>>>> On product: prov-dm-constraints
>>>>>>>>>
>>>>>>>>> As requested, I'm submitting an issue where I feel a
>>>>>>>>> PROV-Constraints review comment of mine is not completely answered.
>>>>>>>>>
>>>>>>>>> My original comment:
>>>>>>>>>
>>>>>>>>>> Unique generations
>>>>>>>>>> -----------
>>>>>>>>>> C. Immediately following Inference 12, the text says "the entity
>>>>>>>>>> denoted by e2 is generated by at most one activity (see Constraint
>>>>>>>>>> 27". The Remark below repeats this, "at most one activity could
>>>>>>>>>> generate the entity e2."
>>>>>>>>>>
>>>>>>>>>> This seems wrong. Constraint 27 says that e2 is generated by only
>>>>>>>>>> one generation event, not by only one activity. The distinction
>>>>>>>>>> between these is important. In the primer's example, there is an
>>>>>>>>>> activity ex:compile which is decomposed into steps ex:compose and
>>>>>>>>>> ex:illustrate. While there is only one (implicit) generation
>>>>>>>>>> event for entity ex:chart1, both ex:compile and ex:illustrate can
>>>>>>>>>> be asserted to have generated the entity.
>>>>>>>>>>
>>>>>>>>> Response from editors:
>>>>>>>>>
>>>>>>>>>> Constraint 27 indeed says that there is a single generation event
>>>>>>>>>> and constraint 26 says that the id is a key for a wasGeneratedBy
>>>>>>>>>> which implies that there is a single activity.
>>>>>>>>>>
>>>>>>>>>> In the primer, you assert:
>>>>>>>>>> wasGeneratedBy(ex:chart1, ex:compile,  2012-03-02T10:30:00)
>>>>>>>>>> wasGeneratedBy(ex:chart1, ex:illustrate,  2012-03-02T10:30:00)
>>>>>>>>>>
>>>>>>>>>> This is invalid.
>>>>>>>>>>
>>>>>>>>>> One way to address this is to maintain two levels of abstraction
>>>>>>>>>> for both activities and entities.
>>>>>>>>>>
>>>>>>>>>> wasGeneratedBy(ex:chart1_**abstract, ex:illustrate,
>>>>>>>>>> 2012-03-02T10:30:00)
>>>>>>>>>> specializationOf(ex:chart1,ex:**chart1_abstract)  // or similar.
>>>>>>>>>>
>>>>>>>>> This response explains why the current constraints do not allow
>>>>>>>>> what I described, but not why they are meaningful. The questions
>>>>>>>>> below hopefully articulate my concerns.
>>>>>>>>>
>>>>>>>>> 1. The response suggests that the invalidity of the primer example
>>>>>>>>> is due to it describing multiple levels of abstraction for a
>>>>>>>>> single entity. Why should this be invalid? Why has validity got
>>>>>>>>> anything to do with levels of abstraction? As far as I can see,
>>>>>>>>> this is not stated or explained in PROV-Constraints.
>>>>>>>>>
>>>>>>>>> 2. As ex:chart1_abstract and ex:chart1 are exactly the same entity
>>>>>>>>> with exactly the same attributes and generated at the same
>>>>>>>>> instant, then why would we want statements implying one was more
>>>>>>>>> abstract than the other? Isn't this at least misleading?
>>>>>>>>>
>>>>>>>>> I also have one related follow-on question:
>>>>>>>>>
>>>>>>>>> 3. Even if we do use the specialization approach to get around the
>>>>>>>>> constraints as suggested, there can only be one entity per
>>>>>>>>> generation event. If something is described at multiple levels of
>>>>>>>>> abstraction, then does that necessitate a unique generation event
>>>>>>>>> for each level (each entity)? If so (as appears), why? When I
>>>>>>>>> create the first version of a document, in the same instant I
>>>>>>>>> create both "doc" and "docV1". How do I describe that the event
>>>>>>>>> creating one is the "same" event that created the other? It is
>>>>>>>>> surely the "same" event in some strong, objective sense, even if
>>>>>>>>> we prefer to describe it using a different identifier for each
>>>>>>>>> entity.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Simon
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  --
>>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>>> Scotland, with registration number SC005336.
>>>>>>>>
>>>>>>>>  --
>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>> Scotland, with registration number SC005336.
>>>>>>>
>>>>>>>  --
>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>> Scotland, with registration number SC005336.
>>>>>>
>>>>>>
>>>>>>
>>>>>>  --
>>>>> The University of Edinburgh is a charitable body, registered in
>>>>> Scotland, with registration number SC005336.
>>>>>
>>>> --
>>>> Professor Luc Moreau
>>>> Electronics and Computer Science   tel:   +44 23 8059 4487
>>>> University of Southampton          fax:   +44 23 8059 2865
>>>> Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
>>>> United Kingdom                     http://www.ecs.soton.ac.uk/~**lavm<http://www.ecs.soton.ac.uk/%7Elavm>
>>>>
>>>>
>>>>
>>>>
>>> --
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>>
>>>
>> --
>> Professor Luc Moreau
>> Electronics and Computer Science   tel:   +44 23 8059 4487
>> University of Southampton          fax:   +44 23 8059 2865
>> Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
>> United Kingdom                     http://www.ecs.soton.ac.uk/~**lavm<http://www.ecs.soton.ac.uk/%7Elavm>
>>
>>
>>
>
> --
> Curt Tilmes, Ph.D.
> U.S. Global Change Research Program
> 1717 Pennsylvania Avenue NW, Suite 250
> Washington, D.C. 20006, USA
>
> +1 202-419-3479 (office)
> +1 443-987-6228 (cell)
> globalchange.gov
>
>

Received on Friday, 10 August 2012 18:59:20 UTC