Re: PROV-ISSUE-26 (uses and generates questions): How can one figure out the provenance of a given entity? from Graham Klyne on 2011-08-05 (public-prov-wg@w3.org from August 2011)

From: Graham Klyne <GK@ninebynine.org>
Date: Fri, 05 Aug 2011 09:58:59 +0100
To: Reza B'far <reza.bfar@oracle.com>
CC: Paulo Pinheiro da Silva <paulo@utep.edu>, "public-prov-wg@w3.org" <public-prov-wg@w3.org>
Message-ID: <4E3BB0D3.7020808@ninebynine.org>
Reza B'far wrote:
 > Ok.  So, back to the original question.  Are there scenarios under which 
"replay" is not possible after some time has passed?  Can someone more formally 
define replay please?

Reza,

A good question, and one that is close to my involvement with workflow 
preservation.  But I'm also wondering if it's in scope for this provenance WG.

While it's clear (to me) that provenance has a key role to play in supporting 
replayability (or determining if a process execution is replayable), it seems to 
me that the mechanics (and hence possibility, or indeed formal definition) of 
replay are actually not part of provenance.

Addressing your question in that light, I think that the provenance model should 
not assume that a process execution is or is not replayable, but should provide 
useful supporting information in scenarios that are replayable (for some value 
of "replayable").  But, and maybe this is the point of your question, if a 
process execution *is* replayable then any such replay would, IMO, be a 
*separate* process execution, even if it uses the same inputs and generates the 
same outputs.  I think this is inherent in the notion that a process execution 
is something that *has been observed*.

#g
--

Reza B'far wrote:
> Ok.  So, back to the original question.  Are there scenarios under which "replay" is not possible after some time has passed?  Can someone more formally define replay please?
> 
> Thanks
> 
> On Aug 5, 2011, at 12:12 AM, Graham Klyne <GK@ninebynine.org> wrote:
> 
>> Good point here, I think.  Given that a process execution *has happened* or *has been observed* in some context...
>>
>> #g
>> --
>>
>> Paulo Pinheiro da Silva wrote:
>>> Hi Luc,
>>> I would say that the thing that is deterministic or not is the recipe of the process and neither the process execution or the process itself. For example, a recipe can be deductive or inductive.
>>> It is a dangerous proposition to allowing process executions to be labeled as deterministic or non-deterministic. For instance, let say that one process is defined by a deductive recipe. This means that every execution of this process needs to be deterministic. However, we cannot prevent one execution of a process A to be deterministic and another execution of A to be non-deterministic if we allow the representation to accommodate such inconsistencies.
>>> Many thanks,
>>> Paulo.
>>> On 8/5/2011 12:42 AM, Luc Moreau wrote:
>>>> Hi Jim and Reza,
>>>>
>>>> Jim's assumption is right.
>>>> I am happy to mention (non)-determinism for PEs.
>>>>
>>>> Regards,
>>>> Luc
>>>>
>>>> On 05/08/11 01:51, Reza B'Far wrote:
>>>>> Makes sense.
>>>>>
>>>>> So, I suggest that we at least document that PE can be deterministic
>>>>> or non-deterministic (both) so that it's not assumed that it is
>>>>> deterministic... unless the majority here think this is obviated.
>>>>>
>>>>> On 8/4/11 5:42 PM, Myers, Jim wrote:
>>>>>> I assume (always a bad idea :-)) that Luc means replay as in starting
>>>>>> from the same input and running the same PE and checking to see if
>>>>>> you get the same output. A lossy process would not be a problem since
>>>>>> you have the original input, assuming you still have access. If the
>>>>>> PE changes the image by rewriting the file, you’d at least have Bobs
>>>>>> representing the file before and after and would know that you need
>>>>>> access to the before-content to do replay. (Whether you have that
>>>>>> version/back-up copy is out of scope).
>>>>>>
>>>>>> Another interesting replay question is if the PE is random/stochastic
>>>>>> - a replay would not give the same result, but many replays would
>>>>>> have some statistical relationship to each other. In either case, I
>>>>>> think the provenance role is just to point to the Bobs and the PE so
>>>>>> if you have access to the Bobs and understand what the PE is doing,
>>>>>> you could try to replay. Going beyond that is probably out of scope...
>>>>>>
>>>>>>   Jim
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: public-prov-wg-request@w3.org [mailto:public-prov-wg-
>>>>>>> request@w3.org] On Behalf Of Reza B'Far
>>>>>>> Sent: Thursday, August 04, 2011 7:40 PM
>>>>>>> To: public-prov-wg@w3.org
>>>>>>> Subject: Re: PROV-ISSUE-26 (uses and generates questions): How can
>>>>>>> one figure
>>>>>>> out the provenance of a given entity?
>>>>>>>
>>>>>>> Luc -
>>>>>>>
>>>>>>> You mention "you may want to replay the execution...".  Question
>>>>>>> (and I hope
>>>>>>> I'm not missing this conversation on a different thread) -
>>>>>>>
>>>>>>> Is Process Execution always lossless and linear in time? In other
>>>>>>> words, is replay
>>>>>>> always possible? (for example, can image compression be a process
>>>>>>> execution
>>>>>>> since the compression may be lossy?)  Either way, I think this is
>>>>>>> important to
>>>>>>> articulate since it'll have ramifications on how inference engines
>>>>>>> decide
>>>>>>> whether it's possible to "replay" and if the "replay" is exact or
>>>>>>> approximate.
>>>>>>>
>>>>>>> Hope the question is not nonsensical.
>>>>>>>
>>>>>>> On 8/4/11 4:16 PM, Luc Moreau wrote:
>>>>>>>> Hi Paulo,
>>>>>>>>
>>>>>>>> Using the notation we have introduced in the provenance model, this is
>>>>>>>> writen
>>>>>>>>
>>>>>>>>
>>>>>>>> uses(pe, a, r_a)
>>>>>>>> uses(pe, b, r_b)
>>>>>>>> isGeneratedBy(c,pe,r_c)
>>>>>>>> isDerivedFrom(c,a)
>>>>>>>>
>>>>>>>> where a,b,c are entities, pe a process execution and r_a, r_b, r_c
>>>>>>>> roles.
>>>>>>>>
>>>>>>>> To try and answer your questions:
>>>>>>>> - if something is wrong about c, you may want to inspect pe, and
>>>>>>>> hopefully
>>>>>>>>     there are assertions about pe (not in this excerpt) which may be
>>>>>>>> useful
>>>>>>>>
>>>>>>>> - you may want to replay the execution, and so having a and b, and
>>>>>>>> knowing
>>>>>>> which
>>>>>>>>     process definition underping pe, may help you verify the result.
>>>>>>>>
>>>>>>>> - I assume you mean can we infer that c was derived by the process
>>>>>>>> execution
>>>>>>>>
>>>>>>>>     Yes, this is explained in the document, and further refine in the
>>>>>>>> soon-to-be-released new version.
>>>>>>>>      Only one pe can generate c (in one account).
>>>>>>>>      And from a derivation from c to a, one can infer the existence of
>>>>>>>> a pe which generated c and  used a.
>>>>>>>>
>>>>>>>> I hope it helps,
>>>>>>>> Cheers,
>>>>>>>> Luc
>>>>>>>>
>>>>>>>> On 07/07/11 15:50, Provenance Working Group Issue Tracker wrote:
>>>>>>>>> PROV-ISSUE-26 (uses and generates questions): How can one figure out
>>>>>>>>> the provenance of a given entity?
>>>>>>>>>
>>>>>>>>> http://www.w3.org/2011/prov/track/issues/26
>>>>>>>>>
>>>>>>>>> Raised by: Paulo Pinheiro da Silva
>>>>>>>>> On product:
>>>>>>>>>
>>>>>>>>> Context:
>>>>>>>>> 1. P uses A
>>>>>>>>> 2. P uses B
>>>>>>>>> 3. P generates C
>>>>>>>>> 4. C derived from A
>>>>>>>>>
>>>>>>>>> If the provenance of C is the concern of a user of C (as opposed to
>>>>>>>>> the provenance of a process that generates C), one may have the
>>>>>>>>> following
>>>>>>> questions:
>>>>>>>>> 1) What the “uses” and “generates” relationships are adding to one’s
>>>>>>>>> understanding of C if something is wrong with C?
>>>>>>>>> 2) Can we infer that A was derived by the execution of process P?
>>>>>>>>> How?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>
> 
>
Received on Friday, 5 August 2011 09:22:32 UTC