Re: PROV-ISSUE-473 (generating-activity): Unique generation events and activities [prov-dm-constraints]

Jun, Stephen, Daniel, Curt, Sam, all,

Given you voted against keeping the original design, or were supportive 
of another design,
can you please confirm that, like Simon, you are fine with the proposed 
design.

Thanks
Luc


On 10/08/12 13:48, Miles, Simon wrote:
> Sure, fine with me. I just wanted to make sure I could explain the meaning behind the proposals, as the constraints themselves are just syntactic rules.
>
> Thanks,
> Simon
>
> Dr Simon Miles
> Senior Lecturer, Department of Informatics
> Kings College London, WC2R 2LS, UK
> +44 (0)20 7848 1166
>
> Evolutionary Testing of Autonomous Software Agents:
> http://eprints.dcs.kcl.ac.uk/1370/
> ________________________________________
> From: James Cheney [jcheney@inf.ed.ac.uk]
> Sent: 10 August 2012 12:43
> To: Miles, Simon; Luc Moreau
> Cc: public-prov-wg@w3.org
> Subject: Re: PROV-ISSUE-473 (generating-activity): Unique generation events      and activities [prov-dm-constraints]
>
> On Aug 10, 2012, at 11:45 AM, Luc Moreau wrote:
>
>> Hi Simon,
>>
>>
>> On 10/08/12 11:18, Miles, Simon wrote:
>>> Hello,
>>>
>>> Sounds like a good solution to me, thanks.
>> Great.
>>
>>> I interpret this as: a generation event identifier is unique to an event-activity pair, but an entity can have multiple generation event identifiers as long as these only refer to different descriptions of the same event (same instant at which the same entity came into being).
>> Your interpretation is slightly different from the technical definition.
>> Identifiers identify events and not descriptions.
>> So, an entity can have multiple generation events.
>> We require them to occur at the same instant.
>>
>> As a reminder, nowhere in prov-dm, we define a notion of event.
>> Events are only defined in prov-constraints.
>>
> To refine this, I'd slightly revise Simon's restatement as follows.
>
> "... a generation event identifier is unique to an event-activity pair, but an entity can have multiple generation event identifiers as long as these only refer to **simultaneous generation events** (all taking place at the instant at which the entity came into being)."
>
> Simon, if that is also OK with you I will make the change and close.  Perhaps we should add a remark to this effect somewhere too, to ensure that the intention is clear (replacing any text that suggests that the generation event is unique to the entity.)
>
> --James
>
>> Luc
>>
>>
>>
>>> Thanks,
>>> Simon
>>>
>>> Dr Simon Miles
>>> Senior Lecturer, Department of Informatics
>>> Kings College London, WC2R 2LS, UK
>>> +44 (0)20 7848 1166
>>>
>>> Evolutionary Testing of Autonomous Software Agents:
>>> http://eprints.dcs.kcl.ac.uk/1370/
>>> ________________________________________
>>> From: James Cheney [jcheney@inf.ed.ac.uk]
>>> Sent: 10 August 2012 10:25
>>> To: Miles, Simon
>>> Cc: Provenance Working Group
>>> Subject: Re: PROV-ISSUE-473 (generating-activity): Unique generation events    and activities [prov-dm-constraints]
>>>
>>> Hi again,
>>>
>>> Luc, Tom and I had a brief phone call this morning to discuss this and other issues before Luc goes on vacation.
>>>
>>> We came up with the following suggestion to address this issue:
>>>
>>> 0.  Keep the key constraint on generation as-is.
>>>
>>> 1.  Weaken generation-uniqueness as follows:
>>>
>>> IF wasGeneratedBy(gen1; e,a,_t1,_attrs1) and wasGeneratedBy(gen2; e,a,_t2,_attrs2),
>>> THEN gen1 = gen2.
>>>
>>> saying that each pair of entity and activity has at most one generation event.
>>>
>>> 2.  Add constraint generation-generation-ordering:
>>>
>>> IF wasGeneratedBy(gen1;e,a1,t1,attrs1) and wasGeneratedBy(gen2;e,a2,t2,attrs2) THEN id1 precedes id2
>>>
>>> Then it would be possible to write
>>>
>>> wasGeneratedBy(ex:chart1, ex:compile,  2012-03-02T10:30:00)
>>> wasGeneratedBy(ex:chart1, ex:illustrate,  2012-03-02T10:30:00)
>>>
>>> This would expand to
>>>
>>> wasGeneratedBy(gen1; ex:chart1, ex:compile,  2012-03-02T10:30:00,[])
>>> wasGeneratedBy(gen2; ex:chart1, ex:illustrate,  2012-03-02T10:30:00,[])
>>>
>>> With the weaker form of generation-uniqueness, the two events are not required to be equal / mergeable.  However, they are required (by generation-generation-ordering) to be simultaneous:
>>>
>>> gen1 precedes gen2 precedes gen1 (which is allowed since precedes is reflexive).
>>>
>>> If this is acceptable, we propose to resolve the issue by adopting the above changes.  For symmetry, we will adopt similar changes to invalidation-uniqueness (allowing multiple activities to simultaneously invalidate) and add an invalidation-invalidation-ordering constraint.
>>>
>>> --James
>>>
>>> On Aug 9, 2012, at 10:20 PM, James Cheney wrote:
>>>
>>>> Hi,
>>>>
>>>> OK, the contributes attribute appproach was a strawman in any case.
>>>>
>>>> In a sense, the obstacle here is the focus in prov-n on relations that
>>>> we think of as binary but in reality have an identity and additional
>>>> parameters/attributes.  In PROV-O, we would not have this problem, we
>>>> could just link multiple activities to the event id.
>>>>
>>>> Your alternative essentially would (in database constraint terms) weaken
>>>> the key constraint to a functional dependency that says that the event
>>>> id determines only the time. This is fine and would be straightforward,
>>>> if we were working with normal flat relations, BUT because of attribute
>>>> lists (and the alignment we have in mind with rdf) it is not quite so
>>>> easy.
>>>>
>>>> So concretely suppose we say:
>>>>
>>>> wgb(id;e,a1,t,[k1=v1])
>>>> wgb(id;e,a2,t,[k2=v2])
>>>>
>>>> This is currently invalid.  If we adopt your suggestion below, then it would be
>>>> valid.  But it's not clear to me what its normal form should be.
>>>> Should the attributes of the first statement be merged into those of
>>>> the second and vice versa?
>>>>
>>>> In rdf terms, we have been mapping the attribute value pairs to
>>>> properties hanging off the id.  So it seems to me that if we have
>>>> attributes hanging off the same id in different places, they should be
>>>> merged.  In other words, it seems wrong to me to use the same id to
>>>> describe two interactions, one between e and a1 and one between e and
>>>> a2.
>>>>
>>>> So I think the right thing to do is somehow accommodate the fact that
>>>> a generation event could involve multiple equal participants.  If we had
>>>> some lightweight way of collecting activities, so that we could in
>>>> effect write
>>>>
>>>> wgb(id; e,[a1,a2], t, attrs)
>>>>
>>>> would that work?
>>>>
>>>>
>>>> --James
>>>>
>>>> On Thu, 9 Aug 2012 19:45:36 +0100
>>>> "Miles, Simon" <simon.miles@kcl.ac.uk> wrote:
>>>>
>>>>> Hello James,
>>>>>
>>>>> Agreed that simply removing the key constraint may allow too much. We
>>>>> want to keep the defining aspects of an event fixed across all
>>>>> descriptions to be valid. For an event in general, I think that just
>>>>> means time of occurrence, correct? If so, can't we express this as
>>>>> constraints, i.e. relation R with identifier i1 and time t1 cannot be
>>>>> merged with R with identifier i2 and time t2 if R describes an event
>>>>> (e.g. wasGeneratedBy) and i1=i2 but t1/=t2? For a generation event,
>>>>> perhaps it also means the entity?
>>>>>
>>>>> For your use cases, I agree there is probably little difference in
>>>>> practice. It's case 2 I was thinking about. I'm not sure if the
>>>>> textual definitions in the DM preclude case 1, but it's interesting,
>>>>> e.g. music is generated at the instant that each of the individual
>>>>> instruments in a band are being played.
>>>>>
>>>>> I'd find "primary" activities and "contributed" properties hard to
>>>>> explain and justify. I can't see why an activity at one level of
>>>>> abstraction should be any more primary than one at another. I'm
>>>>> unclear how to define contribution so that it works in both
>>>>> directions (sub- to super-activity and vice-versa). Also, shouldn't
>>>>> we be allowing for the merging of statements from multiple sources
>>>>> when this produces a valid instance? If so, then we should allow for
>>>>> two parties to declare a different generating event as primary, and
>>>>> struggle to see why this means their statements should be
>>>>> unmergeable. In conclusion, I'm not yet convinced by the idea.
>>>>>
>>>>> Thanks,
>>>>> Simon
>>>>>
>>>>> Dr Simon Miles
>>>>> Senior Lecturer, Department of Informatics
>>>>> Kings College London, WC2R 2LS, UK
>>>>> +44 (0)20 7848 1166
>>>>>
>>>>> Evolutionary Testing of Autonomous Software Agents:
>>>>> http://eprints.dcs.kcl.ac.uk/1370/
>>>>> ________________________________________
>>>>> From: James Cheney [jcheney@inf.ed.ac.uk]
>>>>> Sent: 09 August 2012 18:22
>>>>> To: Miles, Simon
>>>>> Cc: Provenance Working Group
>>>>> Subject: Re: PROV-ISSUE-473 (generating-activity): Unique generation
>>>>> events  and activities [prov-dm-constraints]
>>>>>
>>>>> OK.  The problem with removing the key constraint is that it takes
>>>>> away a lot more than we probably want, e.g. now we can say:
>>>>>
>>>>> wasGeneratedBy(evt; widget, worker1, Monday)
>>>>> wasGeneratedBy(evt; widget, worker2, Tuesday).
>>>>> wasGeneratedBy(evt; widget, factory, Friday).
>>>>>
>>>>> because (only) the key constraint says that all of the other fields
>>>>> have to match (except attributes, which can be merged).
>>>>>
>>>>> That seems strange to me - the whole point of event identifiers (I
>>>>> thought) is to identify the events.  Most of what we have done
>>>>> assumes events that take place between exactly two things (or at most
>>>>> a small number), rather than arbitrarily many.  So I would say that
>>>>> at least the times should match, otherwise the thing gets generated
>>>>> at two different times.
>>>>>
>>>>> It seems that there are two main use cases:
>>>>>
>>>>> 1.  separate activities participating simultaneously in generating
>>>>> the same entity:
>>>>>
>>>>> wasGeneratedBy(evt1;widget,worker1,t1)
>>>>> wasGeneratedBy(evt2;widget,worker2,t1)
>>>>>
>>>>> 2.  super- and sub-activities generating the same entity via events
>>>>> describing different abstraction levels.
>>>>>
>>>>> wasGeneratedBy(evt1;widget, factory,t1)
>>>>> wasGeneratedBy(evt2;widget, worker,t1)
>>>>> (some non-PROV statement that a1 is part of a2)
>>>>>
>>>>>>  From the point of view of PROV, there is no real difference, since
>>>>>> we don't have a way of saying an activity is a sub-activity of
>>>>>> another... Does this sound right?
>>>>> As a strawman, why wouldn't it work to require a specific "primary"
>>>>> activity (which could be a new activity invented solely for this
>>>>> event), and have an attribute that such as prov:contributedTo that
>>>>> names other activities that contributed to a generation event
>>>>> (perhaps indirectly, such as a super-activity)?
>>>>>
>>>>> Hence:
>>>>>
>>>>> wasGeneratedBy(evt1;e,workers12,t1,[prov:contributed = worker1,
>>>>> prov:contributed = worker2)
>>>>>
>>>>> wasGeneratedBy(evt1;e,worker,t1,[prov:contributed = factory)
>>>>>
>>>>> --James
>>>>>
>>>>> On Aug 9, 2012, at 5:53 PM, Miles, Simon wrote:
>>>>>
>>>>>> Hello James,
>>>>>>
>>>>>> I'm not clear what the invalidity point would actually look like or
>>>>>> entail, so would prefer to reserve comment.
>>>>>>
>>>>>> Yes, happy to provide suggestions, examples, arguments etc. if you
>>>>>> say what you need. I didn't have a particular solution in mind in
>>>>>> the issue raised below, but agree with your suggestion in the
>>>>>> telecon that it implies the removal of the key constraint on
>>>>>> wasGeneratedBy.
>>>>>>
>>>>>> thanks,
>>>>>> Simon
>>>>>>
>>>>>> Dr Simon Miles
>>>>>> Senior Lecturer, Department of Informatics
>>>>>> Kings College London, WC2R 2LS, UK
>>>>>> +44 (0)20 7848 1166
>>>>>>
>>>>>> Evolutionary Testing of Autonomous Software Agents:
>>>>>> http://eprints.dcs.kcl.ac.uk/1370/
>>>>>> ________________________________________
>>>>>> From: James Cheney [jcheney@inf.ed.ac.uk]
>>>>>> Sent: 09 August 2012 17:23
>>>>>> To: Provenance Working Group
>>>>>> Subject: Re: PROV-ISSUE-473 (generating-activity): Unique
>>>>>> generation events and activities [prov-dm-constraints]
>>>>>>
>>>>>> The consensus was that this needs work, either by dropping some
>>>>>> inferences (provided we understand the implications) or finding a
>>>>>> way to accommodate multiple levels of abstraction.
>>>>>>
>>>>>> If we can find a way to allow the inference to be used to determine
>>>>>> *invalidity* if implementations agree with it, while not requiring
>>>>>> everyone use it, will that be OK?
>>>>>>
>>>>>> I will be pestering Simon, Daniel and Stian to offer suggestions
>>>>>> and/or examples.
>>>>>>
>>>>>> --James
>>>>>>
>>>>>>
>>>>>> On Aug 9, 2012, at 3:35 PM, Provenance Working Group Issue Tracker
>>>>>> wrote:
>>>>>>
>>>>>>> PROV-ISSUE-473 (generating-activity): Unique generation events and
>>>>>>> activities [prov-dm-constraints]
>>>>>>>
>>>>>>> http://www.w3.org/2011/prov/track/issues/473
>>>>>>>
>>>>>>> Raised by: Simon Miles
>>>>>>> On product: prov-dm-constraints
>>>>>>>
>>>>>>> As requested, I'm submitting an issue where I feel a
>>>>>>> PROV-Constraints review comment of mine is not completely answered.
>>>>>>>
>>>>>>> My original comment:
>>>>>>>> Unique generations
>>>>>>>> -----------
>>>>>>>> C. Immediately following Inference 12, the text says "the entity
>>>>>>>> denoted by e2 is generated by at most one activity (see Constraint
>>>>>>>> 27". The Remark below repeats this, "at most one activity could
>>>>>>>> generate the entity e2."
>>>>>>>>
>>>>>>>> This seems wrong. Constraint 27 says that e2 is generated by only
>>>>>>>> one generation event, not by only one activity. The distinction
>>>>>>>> between these is important. In the primer's example, there is an
>>>>>>>> activity ex:compile which is decomposed into steps ex:compose and
>>>>>>>> ex:illustrate. While there is only one (implicit) generation
>>>>>>>> event for entity ex:chart1, both ex:compile and ex:illustrate can
>>>>>>>> be asserted to have generated the entity.
>>>>>>> Response from editors:
>>>>>>>> Constraint 27 indeed says that there is a single generation event
>>>>>>>> and constraint 26 says that the id is a key for a wasGeneratedBy
>>>>>>>> which implies that there is a single activity.
>>>>>>>>
>>>>>>>> In the primer, you assert:
>>>>>>>> wasGeneratedBy(ex:chart1, ex:compile,  2012-03-02T10:30:00)
>>>>>>>> wasGeneratedBy(ex:chart1, ex:illustrate,  2012-03-02T10:30:00)
>>>>>>>>
>>>>>>>> This is invalid.
>>>>>>>>
>>>>>>>> One way to address this is to maintain two levels of abstraction
>>>>>>>> for both activities and entities.
>>>>>>>>
>>>>>>>> wasGeneratedBy(ex:chart1_abstract, ex:illustrate,
>>>>>>>> 2012-03-02T10:30:00)
>>>>>>>> specializationOf(ex:chart1,ex:chart1_abstract)  // or similar.
>>>>>>> This response explains why the current constraints do not allow
>>>>>>> what I described, but not why they are meaningful. The questions
>>>>>>> below hopefully articulate my concerns.
>>>>>>>
>>>>>>> 1. The response suggests that the invalidity of the primer example
>>>>>>> is due to it describing multiple levels of abstraction for a
>>>>>>> single entity. Why should this be invalid? Why has validity got
>>>>>>> anything to do with levels of abstraction? As far as I can see,
>>>>>>> this is not stated or explained in PROV-Constraints.
>>>>>>>
>>>>>>> 2. As ex:chart1_abstract and ex:chart1 are exactly the same entity
>>>>>>> with exactly the same attributes and generated at the same
>>>>>>> instant, then why would we want statements implying one was more
>>>>>>> abstract than the other? Isn't this at least misleading?
>>>>>>>
>>>>>>> I also have one related follow-on question:
>>>>>>>
>>>>>>> 3. Even if we do use the specialization approach to get around the
>>>>>>> constraints as suggested, there can only be one entity per
>>>>>>> generation event. If something is described at multiple levels of
>>>>>>> abstraction, then does that necessitate a unique generation event
>>>>>>> for each level (each entity)? If so (as appears), why? When I
>>>>>>> create the first version of a document, in the same instant I
>>>>>>> create both "doc" and "docV1". How do I describe that the event
>>>>>>> creating one is the "same" event that created the other? It is
>>>>>>> surely the "same" event in some strong, objective sense, even if
>>>>>>> we prefer to describe it using a different identifier for each
>>>>>>> entity.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Simon
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>> Scotland, with registration number SC005336.
>>>>>>
>>>>> --
>>>>> The University of Edinburgh is a charitable body, registered in
>>>>> Scotland, with registration number SC005336.
>>>>>
>>>> --
>>>> The University of Edinburgh is a charitable body, registered in
>>>> Scotland, with registration number SC005336.
>>>>
>>>>
>>>>
>>> --
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>> --
>> Professor Luc Moreau
>> Electronics and Computer Science   tel:   +44 23 8059 4487
>> University of Southampton          fax:   +44 23 8059 2865
>> Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
>> United Kingdom                     http://www.ecs.soton.ac.uk/~lavm
>>
>>
>>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>

-- 
Professor Luc Moreau
Electronics and Computer Science   tel:   +44 23 8059 4487
University of Southampton          fax:   +44 23 8059 2865
Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
United Kingdom                     http://www.ecs.soton.ac.uk/~lavm

Received on Friday, 10 August 2012 13:15:28 UTC