- From: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
- Date: Fri, 05 Aug 2011 07:50:03 +0100
- To: public-prov-wg@w3.org
Hi Paolo, Many of these issues are being discussed in PROV-ISSUE-67. In particular, Simon raised the issue of account. You need to check the revised version of the document on Monday, which will contain a revised presentation of derivation. It is unclear to me at this stage, whether the definition of derivation is dependent on account or not, but I made an explicit note about it in the draft document. There seems to be a desire for "short-cuts" for derivation. Somebody may want to elaborate a proposal! I can see some shortcoming in your option A, since a given input may be the cause of several outputs, and several input-output pairs may correspond to different derivations. So, I am not clear how you will encode all that with roles. Annotating PE seems more promising (option B). But we need to think about cardinality of inputs/outputs. Does this mean that each output is derived from each input? Best regards, Luc On 05/08/11 06:30, Paulo Pinheiro da Silva wrote: > Hi Luc, > > Please see my comments in-line below: >> - I assume you mean can we infer that c was derived by the process >> execution >> >> Yes, this is explained in the document, and further refine in the >> soon-to-be-released new version. >> Only one pe can generate c (in one account). >> And from a derivation from c to a, one can infer the existence >> of a >> pe which generated c and used a. > > Yes, this explains a lot! > > I understand that the model must be able to represent that a > derivation from 'a' to 'c' occurred through a process execution and > that the process execution was indeed the one called 'pe'. The fact > that the document explains the inference above appears to support the > need for such description. > > From your message, I see that one cannot derive that 'pe' was the > process execution that derived 'c' without the use of accounts -- and > I do not recall any group discussion of what is an account. So, this > suggests that we are not following the proper concept dependencies to > discuss these provenance concepts in a logical way -- can you see my > point? > > I further understand that the model does not only relies on accounts > but also relies on the use of this restriction that "an entity can > only be generated by one process execution" to be able to infer in our > example that 'pe' was the process execution that derived c. I would > strongly favor the adoption of constructs that are explicitly capable > of stating relationships between data derivations and process executions. > > Going back to the example (I numbered the statements to facilitate the > conversation): > > 1. uses(pe, a, r_a) > 2. uses(pe, b, r_b) > 3. isGeneratedBy(c,pe,r_c) > 4. isDerivedFrom(c,a) > > > I understand that most of this conversation is in support of the need > of representing that 'pe' has an input parameter 'b' that is not used > to derive 'a' (and I am using close world assumption to infer that 'c' > was not derived from 'b' -- is this correct?). Do we really need to > have all this added complexity for every single derivation encoding to > say that 'pe' has this additional parameter that does not affect the > final product of the precess execution? I would further claim that > most process execution inputs and outputs in real life would not > include entities that are not involved in derivations. There are many > things that we can do to simplify this model: > > Option A: To formalize a 'derive' role that can be used both in 'uses' > and 'isGeneratedBy' and to drop (4) > > uses (pe, a, derive) > uses (pe, b r_b) > isGeneratedBy(c, pe, derive) > > Option B: To assume that 'uses' and 'isGeneratedBy' implies derivation > and to add a new relationship to explicitly annotate processes > including the use of roles > > uses (pe, a) > annotates (pe, b, r_b) > isGeneratedBy(c, pe) > > In this case, we could swap the positions of 'pe' and b in case 'b' > was an output of 'pe'. > > Both options would significantly reduce most of the diagrams we have > built so far, what is less work for the specification of provenance, > without losing a single bit of information. Moreover, on top of this, > our definitions of 'uses' and 'isGeneratedBy' would stand on their own > without the need of accounts or the enforcement of restrictions such > as that 'c' can only be generated by 'pe' (I also have lots of things > to discuss in terms of this restriction in case we decide to keep the > current approach). > > I am not saying that we only have options A and B (or even that > options A and B are correct). We may have other options and I am just > proposing A and B to demonstrate the there are other ways of > representing provenance that may be more beneficial than the current > approach. > > Many thanks, > Paulo. > >> I hope it helps, >> Cheers, >> Luc >> >> On 07/07/11 15:50, Provenance Working Group Issue Tracker wrote: >> > PROV-ISSUE-26 (uses and generates questions): How can one figure >> out the provenance of a given entity? >> > >> > http://www.w3.org/2011/prov/track/issues/26 >> > >> > Raised by: Paulo Pinheiro da Silva >> > On product: >> > >> > Context: >> > 1. P uses A >> > 2. P uses B >> > 3. P generates C >> > 4. C derived from A >> > >> > If the provenance of C is the concern of a user of C (as opposed >> to the provenance of a process that generates C), one may have the >> following questions: >> > >> > 1) What the “uses” and “generates” relationships are adding to >> one’s understanding of C if something is wrong with C? >> > 2) Can we infer that A was derived by the execution of process P? >> How? >> > >> > >> > >> > >> > > >
Received on Friday, 5 August 2011 06:50:38 UTC