RE: PROV-ISSUE-473 (generating-activity): Unique generation events and activities [prov-dm-constraints]

Hello James,

Agreed that simply removing the key constraint may allow too much. We want to keep the defining aspects of an event fixed across all descriptions to be valid. For an event in general, I think that just means time of occurrence, correct? If so, can't we express this as constraints, i.e. relation R with identifier i1 and time t1 cannot be merged with R with identifier i2 and time t2 if R describes an event (e.g. wasGeneratedBy) and i1=i2 but t1/=t2? For a generation event, perhaps it also means the entity?

For your use cases, I agree there is probably little difference in practice. It's case 2 I was thinking about. I'm not sure if the textual definitions in the DM preclude case 1, but it's interesting, e.g. music is generated at the instant that each of the individual instruments in a band are being played.

I'd find "primary" activities and "contributed" properties hard to explain and justify. I can't see why an activity at one level of abstraction should be any more primary than one at another. I'm unclear how to define contribution so that it works in both directions (sub- to super-activity and vice-versa). Also, shouldn't we be allowing for the merging of statements from multiple sources when this produces a valid instance? If so, then we should allow for two parties to declare a different generating event as primary, and struggle to see why this means their statements should be unmergeable. In conclusion, I'm not yet convinced by the idea.

Thanks,
Simon

Dr Simon Miles
Senior Lecturer, Department of Informatics
Kings College London, WC2R 2LS, UK
+44 (0)20 7848 1166

Evolutionary Testing of Autonomous Software Agents:
http://eprints.dcs.kcl.ac.uk/1370/
________________________________________
From: James Cheney [jcheney@inf.ed.ac.uk]
Sent: 09 August 2012 18:22
To: Miles, Simon
Cc: Provenance Working Group
Subject: Re: PROV-ISSUE-473 (generating-activity): Unique generation events  and activities [prov-dm-constraints]

OK.  The problem with removing the key constraint is that it takes away a lot more than we probably want, e.g. now we can say:

wasGeneratedBy(evt; widget, worker1, Monday)
wasGeneratedBy(evt; widget, worker2, Tuesday).
wasGeneratedBy(evt; widget, factory, Friday).

because (only) the key constraint says that all of the other fields have to match (except attributes, which can be merged).

That seems strange to me - the whole point of event identifiers (I thought) is to identify the events.  Most of what we have done assumes events that take place between exactly two things (or at most a small number), rather than arbitrarily many.  So I would say that at least the times should match, otherwise the thing gets generated at two different times.

It seems that there are two main use cases:

1.  separate activities participating simultaneously in generating the same entity:

wasGeneratedBy(evt1;widget,worker1,t1)
wasGeneratedBy(evt2;widget,worker2,t1)

2.  super- and sub-activities generating the same entity via events describing different abstraction levels.

wasGeneratedBy(evt1;widget, factory,t1)
wasGeneratedBy(evt2;widget, worker,t1)
(some non-PROV statement that a1 is part of a2)

>From the point of view of PROV, there is no real difference, since we don't have a way of saying an activity is a sub-activity of another... Does this sound right?

As a strawman, why wouldn't it work to require a specific "primary" activity (which could be a new activity invented solely for this event), and have an attribute that such as prov:contributedTo that names other activities that contributed to a generation event (perhaps indirectly, such as a super-activity)?

Hence:

wasGeneratedBy(evt1;e,workers12,t1,[prov:contributed = worker1, prov:contributed = worker2)

wasGeneratedBy(evt1;e,worker,t1,[prov:contributed = factory)

--James

On Aug 9, 2012, at 5:53 PM, Miles, Simon wrote:

> Hello James,
>
> I'm not clear what the invalidity point would actually look like or entail, so would prefer to reserve comment.
>
> Yes, happy to provide suggestions, examples, arguments etc. if you say what you need. I didn't have a particular solution in mind in the issue raised below, but agree with your suggestion in the telecon that it implies the removal of the key constraint on wasGeneratedBy.
>
> thanks,
> Simon
>
> Dr Simon Miles
> Senior Lecturer, Department of Informatics
> Kings College London, WC2R 2LS, UK
> +44 (0)20 7848 1166
>
> Evolutionary Testing of Autonomous Software Agents:
> http://eprints.dcs.kcl.ac.uk/1370/
> ________________________________________
> From: James Cheney [jcheney@inf.ed.ac.uk]
> Sent: 09 August 2012 17:23
> To: Provenance Working Group
> Subject: Re: PROV-ISSUE-473 (generating-activity): Unique generation events and activities [prov-dm-constraints]
>
> The consensus was that this needs work, either by dropping some inferences (provided we understand the implications) or finding a way to accommodate multiple levels of abstraction.
>
> If we can find a way to allow the inference to be used to determine *invalidity* if implementations agree with it, while not requiring everyone use it, will that be OK?
>
> I will be pestering Simon, Daniel and Stian to offer suggestions and/or examples.
>
> --James
>
>
> On Aug 9, 2012, at 3:35 PM, Provenance Working Group Issue Tracker wrote:
>
>> PROV-ISSUE-473 (generating-activity): Unique generation events and activities [prov-dm-constraints]
>>
>> http://www.w3.org/2011/prov/track/issues/473
>>
>> Raised by: Simon Miles
>> On product: prov-dm-constraints
>>
>> As requested, I'm submitting an issue where I feel a PROV-Constraints review comment of mine is not completely answered.
>>
>> My original comment:
>>> Unique generations
>>> -----------
>>> C. Immediately following Inference 12, the text says "the entity
>>> denoted by e2 is generated by at most one activity (see Constraint
>>> 27". The Remark below repeats this, "at most one activity could
>>> generate the entity e2."
>>>
>>> This seems wrong. Constraint 27 says that e2 is generated by only one
>>> generation event, not by only one activity. The distinction between
>>> these is important. In the primer's example, there is an activity
>>> ex:compile which is decomposed into steps ex:compose and
>>> ex:illustrate. While there is only one (implicit) generation event for
>>> entity ex:chart1, both ex:compile and ex:illustrate can be asserted to
>>> have generated the entity.
>>
>> Response from editors:
>>> Constraint 27 indeed says that there is a single generation event
>>> and constraint 26 says that the id is a key for a wasGeneratedBy
>>> which implies that there is a single activity.
>>>
>>> In the primer, you assert:
>>> wasGeneratedBy(ex:chart1, ex:compile,  2012-03-02T10:30:00)
>>> wasGeneratedBy(ex:chart1, ex:illustrate,  2012-03-02T10:30:00)
>>>
>>> This is invalid.
>>>
>>> One way to address this is to maintain two levels of abstraction for
>>> both activities and entities.
>>>
>>> wasGeneratedBy(ex:chart1_abstract, ex:illustrate,  2012-03-02T10:30:00)
>>> specializationOf(ex:chart1,ex:chart1_abstract)  // or similar.
>>
>> This response explains why the current constraints do not allow what I described, but not why they are meaningful. The questions below hopefully articulate my concerns.
>>
>> 1. The response suggests that the invalidity of the primer example is due to it describing multiple levels of abstraction for a single entity. Why should this be invalid? Why has validity got anything to do with levels of abstraction? As far as I can see, this is not stated or explained in PROV-Constraints.
>>
>> 2. As ex:chart1_abstract and ex:chart1 are exactly the same entity with exactly the same attributes and generated at the same instant, then why would we want statements implying one was more abstract than the other? Isn't this at least misleading?
>>
>> I also have one related follow-on question:
>>
>> 3. Even if we do use the specialization approach to get around the constraints as suggested, there can only be one entity per generation event. If something is described at multiple levels of abstraction, then does that necessitate a unique generation event for each level (each entity)? If so (as appears), why? When I create the first version of a document, in the same instant I create both "doc" and "docV1". How do I describe that the event creating one is the "same" event that created the other? It is surely the "same" event in some strong, objective sense, even if we prefer to describe it using a different identifier for each entity.
>>
>> Thanks,
>> Simon
>>
>>
>>
>>
>>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Received on Thursday, 9 August 2012 18:46:09 UTC