W3C home > Mailing lists > Public > public-prov-wg@w3.org > August 2012

Re: PROV-ISSUE-473 (generating-activity): Unique generation events and activities [prov-dm-constraints]

From: James Cheney <jcheney@inf.ed.ac.uk>
Date: Thu, 9 Aug 2012 18:22:54 +0100
Cc: Provenance Working Group <public-prov-wg@w3.org>
Message-Id: <0F177A5B-C5FA-43E2-B9D8-C23058656EE8@inf.ed.ac.uk>
To: "Miles, Simon" <simon.miles@kcl.ac.uk>
OK.  The problem with removing the key constraint is that it takes away a lot more than we probably want, e.g. now we can say:

wasGeneratedBy(evt; widget, worker1, Monday)
wasGeneratedBy(evt; widget, worker2, Tuesday).
wasGeneratedBy(evt; widget, factory, Friday).

because (only) the key constraint says that all of the other fields have to match (except attributes, which can be merged).

That seems strange to me - the whole point of event identifiers (I thought) is to identify the events.  Most of what we have done assumes events that take place between exactly two things (or at most a small number), rather than arbitrarily many.  So I would say that at least the times should match, otherwise the thing gets generated at two different times.

It seems that there are two main use cases:

1.  separate activities participating simultaneously in generating the same entity:


2.  super- and sub-activities generating the same entity via events describing different abstraction levels.

wasGeneratedBy(evt1;widget, factory,t1)
wasGeneratedBy(evt2;widget, worker,t1)
(some non-PROV statement that a1 is part of a2)

>From the point of view of PROV, there is no real difference, since we don't have a way of saying an activity is a sub-activity of another... Does this sound right?

As a strawman, why wouldn't it work to require a specific "primary" activity (which could be a new activity invented solely for this event), and have an attribute that such as prov:contributedTo that names other activities that contributed to a generation event (perhaps indirectly, such as a super-activity)?


wasGeneratedBy(evt1;e,workers12,t1,[prov:contributed = worker1, prov:contributed = worker2)

wasGeneratedBy(evt1;e,worker,t1,[prov:contributed = factory)


On Aug 9, 2012, at 5:53 PM, Miles, Simon wrote:

> Hello James,
> I'm not clear what the invalidity point would actually look like or entail, so would prefer to reserve comment.
> Yes, happy to provide suggestions, examples, arguments etc. if you say what you need. I didn't have a particular solution in mind in the issue raised below, but agree with your suggestion in the telecon that it implies the removal of the key constraint on wasGeneratedBy.
> thanks,
> Simon
> Dr Simon Miles
> Senior Lecturer, Department of Informatics
> Kings College London, WC2R 2LS, UK
> +44 (0)20 7848 1166
> Evolutionary Testing of Autonomous Software Agents:
> http://eprints.dcs.kcl.ac.uk/1370/
> ________________________________________
> From: James Cheney [jcheney@inf.ed.ac.uk]
> Sent: 09 August 2012 17:23
> To: Provenance Working Group
> Subject: Re: PROV-ISSUE-473 (generating-activity): Unique generation events and activities [prov-dm-constraints]
> The consensus was that this needs work, either by dropping some inferences (provided we understand the implications) or finding a way to accommodate multiple levels of abstraction.
> If we can find a way to allow the inference to be used to determine *invalidity* if implementations agree with it, while not requiring everyone use it, will that be OK?
> I will be pestering Simon, Daniel and Stian to offer suggestions and/or examples.
> --James
> On Aug 9, 2012, at 3:35 PM, Provenance Working Group Issue Tracker wrote:
>> PROV-ISSUE-473 (generating-activity): Unique generation events and activities [prov-dm-constraints]
>> http://www.w3.org/2011/prov/track/issues/473
>> Raised by: Simon Miles
>> On product: prov-dm-constraints
>> As requested, I'm submitting an issue where I feel a PROV-Constraints review comment of mine is not completely answered.
>> My original comment:
>>> Unique generations
>>> -----------
>>> C. Immediately following Inference 12, the text says "the entity
>>> denoted by e2 is generated by at most one activity (see Constraint
>>> 27". The Remark below repeats this, "at most one activity could
>>> generate the entity e2."
>>> This seems wrong. Constraint 27 says that e2 is generated by only one
>>> generation event, not by only one activity. The distinction between
>>> these is important. In the primer's example, there is an activity
>>> ex:compile which is decomposed into steps ex:compose and
>>> ex:illustrate. While there is only one (implicit) generation event for
>>> entity ex:chart1, both ex:compile and ex:illustrate can be asserted to
>>> have generated the entity.
>> Response from editors:
>>> Constraint 27 indeed says that there is a single generation event
>>> and constraint 26 says that the id is a key for a wasGeneratedBy
>>> which implies that there is a single activity.
>>> In the primer, you assert:
>>> wasGeneratedBy(ex:chart1, ex:compile,  2012-03-02T10:30:00)
>>> wasGeneratedBy(ex:chart1, ex:illustrate,  2012-03-02T10:30:00)
>>> This is invalid.
>>> One way to address this is to maintain two levels of abstraction for
>>> both activities and entities.
>>> wasGeneratedBy(ex:chart1_abstract, ex:illustrate,  2012-03-02T10:30:00)
>>> specializationOf(ex:chart1,ex:chart1_abstract)  // or similar.
>> This response explains why the current constraints do not allow what I described, but not why they are meaningful. The questions below hopefully articulate my concerns.
>> 1. The response suggests that the invalidity of the primer example is due to it describing multiple levels of abstraction for a single entity. Why should this be invalid? Why has validity got anything to do with levels of abstraction? As far as I can see, this is not stated or explained in PROV-Constraints.
>> 2. As ex:chart1_abstract and ex:chart1 are exactly the same entity with exactly the same attributes and generated at the same instant, then why would we want statements implying one was more abstract than the other? Isn't this at least misleading?
>> I also have one related follow-on question:
>> 3. Even if we do use the specialization approach to get around the constraints as suggested, there can only be one entity per generation event. If something is described at multiple levels of abstraction, then does that necessitate a unique generation event for each level (each entity)? If so (as appears), why? When I create the first version of a document, in the same instant I create both "doc" and "docV1". How do I describe that the event creating one is the "same" event that created the other? It is surely the "same" event in some strong, objective sense, even if we prefer to describe it using a different identifier for each entity.
>> Thanks,
>> Simon
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Received on Thursday, 9 August 2012 17:23:17 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:51:19 UTC