RE: PROV-ISSUE-26 (uses and generates questions): How can one figure out the provenance of a given entity? from Myers, Jim on 2011-08-05 (public-prov-wg@w3.org from August 2011)

From: Myers, Jim <MYERSJ4@rpi.edu>
Date: Fri, 5 Aug 2011 08:52:49 -0400
To: Graham Klyne <GK@ninebynine.org>, "Reza B'far" <reza.bfar@oracle.com>
CC: Paulo Pinheiro da Silva <paulo@utep.edu>, <public-prov-wg@w3.org>
Message-ID: <B7376F3FB29F7E42A510EB5026D99EF20552A820@troy-be-ex2.win.rpi.edu>
Definitely separate and in that sense, nothing can be replayed (can't go back in time, going back to the moon isn't the same as going the first time). Beyond that, there are so many possible definitions of replay that a language/model that does not define in detail what Bobs and PEs are can't really make any guarantees - implementing replay basically means defining equivalency of Bobs and PEs which is domain and purpose specific (for a Bob, do we need the same molecules on the disk, just the same bits for the file, just a file with the same characters and perhaps different encoding, etc. for a PE do we just need to use the same recipe, have all the original inputs or just things of the right type, same seeds for random numbers, any program that claims to run the same algorithm, etc. Whether the output matches the original is problematic the same way.)

I think the best we can do is to make the links to domain info - let Bobs point to their types, link PEs with recipes, make sure used/generated document the roles each play, etc. If we do that, and allow asserters to describe things at whatever level of detail they think they need, we've done our part.

 Jim

> -----Original Message-----
> From: public-prov-wg-request@w3.org [mailto:public-prov-wg-
> request@w3.org] On Behalf Of Graham Klyne
> Sent: Friday, August 05, 2011 4:59 AM
> To: Reza B'far
> Cc: Paulo Pinheiro da Silva; public-prov-wg@w3.org
> Subject: Re: PROV-ISSUE-26 (uses and generates questions): How can one figure
> out the provenance of a given entity?
> 
> Reza B'far wrote:
>  > Ok.  So, back to the original question.  Are there scenarios under which
> "replay" is not possible after some time has passed?  Can someone more
> formally define replay please?
> 
> Reza,
> 
> A good question, and one that is close to my involvement with workflow
> preservation.  But I'm also wondering if it's in scope for this provenance WG.
> 
> While it's clear (to me) that provenance has a key role to play in supporting
> replayability (or determining if a process execution is replayable), it seems to
> me that the mechanics (and hence possibility, or indeed formal definition) of
> replay are actually not part of provenance.
> 
> Addressing your question in that light, I think that the provenance model should
> not assume that a process execution is or is not replayable, but should provide
> useful supporting information in scenarios that are replayable (for some value
> of "replayable").  But, and maybe this is the point of your question, if a process
> execution *is* replayable then any such replay would, IMO, be a
> *separate* process execution, even if it uses the same inputs and generates the
> same outputs.  I think this is inherent in the notion that a process execution is
> something that *has been observed*.
> 
> #g
> --
> 
> Reza B'far wrote:
> > Ok.  So, back to the original question.  Are there scenarios under which
> "replay" is not possible after some time has passed?  Can someone more
> formally define replay please?
> >
> > Thanks
> >
> > On Aug 5, 2011, at 12:12 AM, Graham Klyne <GK@ninebynine.org> wrote:
> >
> >> Good point here, I think.  Given that a process execution *has happened* or
> *has been observed* in some context...
> >>
> >> #g
> >> --
> >>
> >> Paulo Pinheiro da Silva wrote:
> >>> Hi Luc,
> >>> I would say that the thing that is deterministic or not is the recipe of the
> process and neither the process execution or the process itself. For example, a
> recipe can be deductive or inductive.
> >>> It is a dangerous proposition to allowing process executions to be labeled
> as deterministic or non-deterministic. For instance, let say that one process is
> defined by a deductive recipe. This means that every execution of this process
> needs to be deterministic. However, we cannot prevent one execution of a
> process A to be deterministic and another execution of A to be non-
> deterministic if we allow the representation to accommodate such
> inconsistencies.
> >>> Many thanks,
> >>> Paulo.
> >>> On 8/5/2011 12:42 AM, Luc Moreau wrote:
> >>>> Hi Jim and Reza,
> >>>>
> >>>> Jim's assumption is right.
> >>>> I am happy to mention (non)-determinism for PEs.
> >>>>
> >>>> Regards,
> >>>> Luc
> >>>>
> >>>> On 05/08/11 01:51, Reza B'Far wrote:
> >>>>> Makes sense.
> >>>>>
> >>>>> So, I suggest that we at least document that PE can be
> >>>>> deterministic or non-deterministic (both) so that it's not assumed
> >>>>> that it is deterministic... unless the majority here think this is obviated.
> >>>>>
> >>>>> On 8/4/11 5:42 PM, Myers, Jim wrote:
> >>>>>> I assume (always a bad idea :-)) that Luc means replay as in
> >>>>>> starting from the same input and running the same PE and checking
> >>>>>> to see if you get the same output. A lossy process would not be a
> >>>>>> problem since you have the original input, assuming you still
> >>>>>> have access. If the PE changes the image by rewriting the file,
> >>>>>> you’d at least have Bobs representing the file before and after
> >>>>>> and would know that you need access to the before-content to do
> >>>>>> replay. (Whether you have that version/back-up copy is out of scope).
> >>>>>>
> >>>>>> Another interesting replay question is if the PE is
> >>>>>> random/stochastic
> >>>>>> - a replay would not give the same result, but many replays would
> >>>>>> have some statistical relationship to each other. In either case,
> >>>>>> I think the provenance role is just to point to the Bobs and the
> >>>>>> PE so if you have access to the Bobs and understand what the PE
> >>>>>> is doing, you could try to replay. Going beyond that is probably out of
> scope...
> >>>>>>
> >>>>>>   Jim
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: public-prov-wg-request@w3.org [mailto:public-prov-wg-
> >>>>>>> request@w3.org] On Behalf Of Reza B'Far
> >>>>>>> Sent: Thursday, August 04, 2011 7:40 PM
> >>>>>>> To: public-prov-wg@w3.org
> >>>>>>> Subject: Re: PROV-ISSUE-26 (uses and generates questions): How
> >>>>>>> can one figure out the provenance of a given entity?
> >>>>>>>
> >>>>>>> Luc -
> >>>>>>>
> >>>>>>> You mention "you may want to replay the execution...".  Question
> >>>>>>> (and I hope I'm not missing this conversation on a different
> >>>>>>> thread) -
> >>>>>>>
> >>>>>>> Is Process Execution always lossless and linear in time? In
> >>>>>>> other words, is replay always possible? (for example, can image
> >>>>>>> compression be a process execution since the compression may be
> >>>>>>> lossy?)  Either way, I think this is important to articulate
> >>>>>>> since it'll have ramifications on how inference engines decide
> >>>>>>> whether it's possible to "replay" and if the "replay" is exact
> >>>>>>> or approximate.
> >>>>>>>
> >>>>>>> Hope the question is not nonsensical.
> >>>>>>>
> >>>>>>> On 8/4/11 4:16 PM, Luc Moreau wrote:
> >>>>>>>> Hi Paulo,
> >>>>>>>>
> >>>>>>>> Using the notation we have introduced in the provenance model,
> >>>>>>>> this is writen
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> uses(pe, a, r_a)
> >>>>>>>> uses(pe, b, r_b)
> >>>>>>>> isGeneratedBy(c,pe,r_c)
> >>>>>>>> isDerivedFrom(c,a)
> >>>>>>>>
> >>>>>>>> where a,b,c are entities, pe a process execution and r_a, r_b,
> >>>>>>>> r_c roles.
> >>>>>>>>
> >>>>>>>> To try and answer your questions:
> >>>>>>>> - if something is wrong about c, you may want to inspect pe,
> >>>>>>>> and hopefully
> >>>>>>>>     there are assertions about pe (not in this excerpt) which
> >>>>>>>> may be useful
> >>>>>>>>
> >>>>>>>> - you may want to replay the execution, and so having a and b,
> >>>>>>>> and knowing
> >>>>>>> which
> >>>>>>>>     process definition underping pe, may help you verify the result.
> >>>>>>>>
> >>>>>>>> - I assume you mean can we infer that c was derived by the
> >>>>>>>> process execution
> >>>>>>>>
> >>>>>>>>     Yes, this is explained in the document, and further refine
> >>>>>>>> in the soon-to-be-released new version.
> >>>>>>>>      Only one pe can generate c (in one account).
> >>>>>>>>      And from a derivation from c to a, one can infer the
> >>>>>>>> existence of a pe which generated c and  used a.
> >>>>>>>>
> >>>>>>>> I hope it helps,
> >>>>>>>> Cheers,
> >>>>>>>> Luc
> >>>>>>>>
> >>>>>>>> On 07/07/11 15:50, Provenance Working Group Issue Tracker wrote:
> >>>>>>>>> PROV-ISSUE-26 (uses and generates questions): How can one
> >>>>>>>>> figure out the provenance of a given entity?
> >>>>>>>>>
> >>>>>>>>> http://www.w3.org/2011/prov/track/issues/26

> >>>>>>>>>
> >>>>>>>>> Raised by: Paulo Pinheiro da Silva On product:
> >>>>>>>>>
> >>>>>>>>> Context:
> >>>>>>>>> 1. P uses A
> >>>>>>>>> 2. P uses B
> >>>>>>>>> 3. P generates C
> >>>>>>>>> 4. C derived from A
> >>>>>>>>>
> >>>>>>>>> If the provenance of C is the concern of a user of C (as
> >>>>>>>>> opposed to the provenance of a process that generates C), one
> >>>>>>>>> may have the following
> >>>>>>> questions:
> >>>>>>>>> 1) What the “uses” and “generates” relationships are adding to
> >>>>>>>>> one’s understanding of C if something is wrong with C?
> >>>>>>>>> 2) Can we infer that A was derived by the execution of process P?
> >>>>>>>>> How?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>
> >
> >
>
Received on Friday, 5 August 2011 12:53:27 UTC