- From: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
- Date: Fri, 05 Aug 2011 07:50:27 +0100
- To: public-prov-wg@w3.org
s/Paolo/Paulo, On 05/08/11 07:50, Luc Moreau wrote: > > Hi Paolo, > > Many of these issues are being discussed in PROV-ISSUE-67. > > In particular, Simon raised the issue of account. You need to > check the revised version of the document on Monday, which > will contain a revised presentation of derivation. > > It is unclear to me at this stage, whether the definition of derivation > is dependent on account or not, but I made an explicit note about > it in the draft document. > > There seems to be a desire for "short-cuts" for derivation. > Somebody may want to elaborate a proposal! > > I can see some shortcoming in your option A, since a given input may > be the cause > of several outputs, and several input-output pairs may correspond to > different > derivations. So, I am not clear how you will encode all that with roles. > > Annotating PE seems more promising (option B). But we need to think about > cardinality of inputs/outputs. Does this mean that each output is > derived from each input? > > Best regards, > Luc > > On 05/08/11 06:30, Paulo Pinheiro da Silva wrote: >> Hi Luc, >> >> Please see my comments in-line below: >>> - I assume you mean can we infer that c was derived by the process >>> execution >>> >>> Yes, this is explained in the document, and further refine in the >>> soon-to-be-released new version. >>> Only one pe can generate c (in one account). >>> And from a derivation from c to a, one can infer the existence >>> of a >>> pe which generated c and used a. >> >> Yes, this explains a lot! >> >> I understand that the model must be able to represent that a >> derivation from 'a' to 'c' occurred through a process execution and >> that the process execution was indeed the one called 'pe'. The fact >> that the document explains the inference above appears to support the >> need for such description. >> >> From your message, I see that one cannot derive that 'pe' was the >> process execution that derived 'c' without the use of accounts -- and >> I do not recall any group discussion of what is an account. So, this >> suggests that we are not following the proper concept dependencies to >> discuss these provenance concepts in a logical way -- can you see my >> point? >> >> I further understand that the model does not only relies on accounts >> but also relies on the use of this restriction that "an entity can >> only be generated by one process execution" to be able to infer in >> our example that 'pe' was the process execution that derived c. I >> would strongly favor the adoption of constructs that are explicitly >> capable of stating relationships between data derivations and process >> executions. >> >> Going back to the example (I numbered the statements to facilitate >> the conversation): >> >> 1. uses(pe, a, r_a) >> 2. uses(pe, b, r_b) >> 3. isGeneratedBy(c,pe,r_c) >> 4. isDerivedFrom(c,a) >> >> >> I understand that most of this conversation is in support of the need >> of representing that 'pe' has an input parameter 'b' that is not used >> to derive 'a' (and I am using close world assumption to infer that >> 'c' was not derived from 'b' -- is this correct?). Do we really need >> to have all this added complexity for every single derivation >> encoding to say that 'pe' has this additional parameter that does not >> affect the final product of the precess execution? I would further >> claim that most process execution inputs and outputs in real life >> would not include entities that are not involved in derivations. >> There are many things that we can do to simplify this model: >> >> Option A: To formalize a 'derive' role that can be used both in >> 'uses' and 'isGeneratedBy' and to drop (4) >> >> uses (pe, a, derive) >> uses (pe, b r_b) >> isGeneratedBy(c, pe, derive) >> >> Option B: To assume that 'uses' and 'isGeneratedBy' implies >> derivation and to add a new relationship to explicitly annotate >> processes including the use of roles >> >> uses (pe, a) >> annotates (pe, b, r_b) >> isGeneratedBy(c, pe) >> >> In this case, we could swap the positions of 'pe' and b in case 'b' >> was an output of 'pe'. >> >> Both options would significantly reduce most of the diagrams we have >> built so far, what is less work for the specification of provenance, >> without losing a single bit of information. Moreover, on top of this, >> our definitions of 'uses' and 'isGeneratedBy' would stand on their >> own without the need of accounts or the enforcement of restrictions >> such as that 'c' can only be generated by 'pe' (I also have lots of >> things to discuss in terms of this restriction in case we decide to >> keep the current approach). >> >> I am not saying that we only have options A and B (or even that >> options A and B are correct). We may have other options and I am just >> proposing A and B to demonstrate the there are other ways of >> representing provenance that may be more beneficial than the current >> approach. >> >> Many thanks, >> Paulo. >> >>> I hope it helps, >>> Cheers, >>> Luc >>> >>> On 07/07/11 15:50, Provenance Working Group Issue Tracker wrote: >>> > PROV-ISSUE-26 (uses and generates questions): How can one figure >>> out the provenance of a given entity? >>> > >>> > http://www.w3.org/2011/prov/track/issues/26 >>> > >>> > Raised by: Paulo Pinheiro da Silva >>> > On product: >>> > >>> > Context: >>> > 1. P uses A >>> > 2. P uses B >>> > 3. P generates C >>> > 4. C derived from A >>> > >>> > If the provenance of C is the concern of a user of C (as opposed >>> to the provenance of a process that generates C), one may have the >>> following questions: >>> > >>> > 1) What the “uses” and “generates” relationships are adding to >>> one’s understanding of C if something is wrong with C? >>> > 2) Can we infer that A was derived by the execution of process P? >>> How? >>> > >>> > >>> > >>> > >>> > >> >>
Received on Friday, 5 August 2011 06:51:01 UTC