- From: Paulo Pinheiro da Silva <paulo@utep.edu>
- Date: Thu, 4 Aug 2011 23:30:13 -0600
- To: <public-prov-wg@w3.org>
Hi Luc, Please see my comments in-line below: > - I assume you mean can we infer that c was derived by the process > execution > > Yes, this is explained in the document, and further refine in the > soon-to-be-released new version. > Only one pe can generate c (in one account). > And from a derivation from c to a, one can infer the existence of a > pe which generated c and used a. Yes, this explains a lot! I understand that the model must be able to represent that a derivation from 'a' to 'c' occurred through a process execution and that the process execution was indeed the one called 'pe'. The fact that the document explains the inference above appears to support the need for such description. From your message, I see that one cannot derive that 'pe' was the process execution that derived 'c' without the use of accounts -- and I do not recall any group discussion of what is an account. So, this suggests that we are not following the proper concept dependencies to discuss these provenance concepts in a logical way -- can you see my point? I further understand that the model does not only relies on accounts but also relies on the use of this restriction that "an entity can only be generated by one process execution" to be able to infer in our example that 'pe' was the process execution that derived c. I would strongly favor the adoption of constructs that are explicitly capable of stating relationships between data derivations and process executions. Going back to the example (I numbered the statements to facilitate the conversation): 1. uses(pe, a, r_a) 2. uses(pe, b, r_b) 3. isGeneratedBy(c,pe,r_c) 4. isDerivedFrom(c,a) I understand that most of this conversation is in support of the need of representing that 'pe' has an input parameter 'b' that is not used to derive 'a' (and I am using close world assumption to infer that 'c' was not derived from 'b' -- is this correct?). Do we really need to have all this added complexity for every single derivation encoding to say that 'pe' has this additional parameter that does not affect the final product of the precess execution? I would further claim that most process execution inputs and outputs in real life would not include entities that are not involved in derivations. There are many things that we can do to simplify this model: Option A: To formalize a 'derive' role that can be used both in 'uses' and 'isGeneratedBy' and to drop (4) uses (pe, a, derive) uses (pe, b r_b) isGeneratedBy(c, pe, derive) Option B: To assume that 'uses' and 'isGeneratedBy' implies derivation and to add a new relationship to explicitly annotate processes including the use of roles uses (pe, a) annotates (pe, b, r_b) isGeneratedBy(c, pe) In this case, we could swap the positions of 'pe' and b in case 'b' was an output of 'pe'. Both options would significantly reduce most of the diagrams we have built so far, what is less work for the specification of provenance, without losing a single bit of information. Moreover, on top of this, our definitions of 'uses' and 'isGeneratedBy' would stand on their own without the need of accounts or the enforcement of restrictions such as that 'c' can only be generated by 'pe' (I also have lots of things to discuss in terms of this restriction in case we decide to keep the current approach). I am not saying that we only have options A and B (or even that options A and B are correct). We may have other options and I am just proposing A and B to demonstrate the there are other ways of representing provenance that may be more beneficial than the current approach. Many thanks, Paulo. > I hope it helps, > Cheers, > Luc > > On 07/07/11 15:50, Provenance Working Group Issue Tracker wrote: > > PROV-ISSUE-26 (uses and generates questions): How can one figure out the provenance of a given entity? > > > > http://www.w3.org/2011/prov/track/issues/26 > > > > Raised by: Paulo Pinheiro da Silva > > On product: > > > > Context: > > 1. P uses A > > 2. P uses B > > 3. P generates C > > 4. C derived from A > > > > If the provenance of C is the concern of a user of C (as opposed to the provenance of a process that generates C), one may have the following questions: > > > > 1) What the “uses” and “generates” relationships are adding to one’s understanding of C if something is wrong with C? > > 2) Can we infer that A was derived by the execution of process P? How? > > > > > > > > > >
Received on Friday, 5 August 2011 05:31:17 UTC