Re: PROV-ISSUE-26 (uses and generates questions): How can one figure out the provenance of a given entity?

s/Paolo/Paulo,

On 05/08/11 07:50, Luc Moreau wrote:
>
> Hi Paolo,
>
> Many of these issues are being discussed in PROV-ISSUE-67.
>
> In particular, Simon raised the issue of account. You need to
> check the revised version of the document on Monday, which
> will contain a revised presentation of derivation.
>
> It is unclear to me at this stage, whether the definition of derivation
> is dependent on account or not, but I made an explicit note about
> it in the draft document.
>
> There seems to be a desire for "short-cuts" for derivation.
> Somebody may want to elaborate a proposal!
>
> I can see some shortcoming in your option A, since a given input may 
> be the cause
> of several outputs, and several input-output pairs may correspond to 
> different
> derivations. So, I am not clear how you will encode all that with roles.
>
> Annotating PE seems more promising (option B). But we need to think about
> cardinality of inputs/outputs. Does this mean that each output is 
> derived from each input?
>
> Best regards,
> Luc
>
> On 05/08/11 06:30, Paulo Pinheiro da Silva wrote:
>> Hi Luc,
>>
>> Please see my comments in-line below:
>>> - I assume you mean can we infer that c was derived by the process
>>> execution
>>>
>>>      Yes, this is explained in the document, and further refine in the
>>> soon-to-be-released new version.
>>>       Only one pe can generate c (in one account).
>>>       And from a derivation from c to a, one can infer the existence 
>>> of a
>>> pe which generated c and  used a.
>>
>> Yes, this explains a lot!
>>
>> I understand that the model must be able to represent that a 
>> derivation from 'a' to 'c' occurred through a process execution and 
>> that the process execution was indeed the one called 'pe'. The fact 
>> that the document explains the inference above appears to support the 
>> need for such description.
>>
>> From your message, I see that one cannot derive that 'pe' was the 
>> process execution that derived 'c' without the use of accounts -- and 
>> I do not recall any group discussion of what is an account. So, this 
>> suggests that we are not following the proper concept dependencies to 
>> discuss these provenance concepts in a logical way -- can you see my 
>> point?
>>
>> I further understand that the model does not only relies on accounts 
>> but also relies on the use of this restriction that "an entity can 
>> only be generated by one process execution" to be able to infer in 
>> our example that 'pe' was the process execution that derived c. I 
>> would strongly favor the adoption of constructs that are explicitly 
>> capable of stating relationships between data derivations and process 
>> executions.
>>
>> Going back to the example (I numbered the statements to facilitate 
>> the conversation):
>>
>> 1. uses(pe, a, r_a)
>> 2. uses(pe, b, r_b)
>> 3. isGeneratedBy(c,pe,r_c)
>> 4. isDerivedFrom(c,a)
>>
>>
>> I understand that most of this conversation is in support of the need 
>> of representing that 'pe' has an input parameter 'b' that is not used 
>> to derive 'a' (and I am using close world assumption to infer that 
>> 'c' was not derived from 'b' -- is this correct?). Do we really need 
>> to have all this added complexity for every single derivation 
>> encoding to say that 'pe' has this additional parameter that does not 
>> affect the final product of the precess execution? I would further 
>> claim that most process execution inputs and outputs in real life 
>> would not include entities that are not involved in derivations. 
>> There are many things that we can do to simplify this model:
>>
>> Option A: To formalize a 'derive' role that can be used both in 
>> 'uses' and 'isGeneratedBy' and to drop (4)
>>
>> uses (pe, a, derive)
>> uses (pe, b r_b)
>> isGeneratedBy(c, pe, derive)
>>
>> Option B: To assume that 'uses' and 'isGeneratedBy' implies 
>> derivation and to add a new relationship to explicitly annotate 
>> processes including the use of roles
>>
>> uses (pe, a)
>> annotates (pe, b, r_b)
>> isGeneratedBy(c, pe)
>>
>> In this case, we could swap the positions of 'pe' and b in case 'b' 
>> was an output of 'pe'.
>>
>> Both options would significantly reduce most of the diagrams we have 
>> built so far, what is less work for the specification of provenance, 
>> without losing a single bit of information. Moreover, on top of this, 
>> our definitions of 'uses' and 'isGeneratedBy' would stand on their 
>> own without the need of accounts or the enforcement of restrictions 
>> such as that 'c' can only be generated by 'pe' (I also have lots of 
>> things to discuss in terms of this restriction in case we decide to 
>> keep the current approach).
>>
>> I am not saying that we only have options A and B (or even that 
>> options A and B are correct). We may have other options and I am just 
>> proposing A and B to demonstrate the there are other ways of 
>> representing provenance that may be more beneficial than the current 
>> approach.
>>
>> Many thanks,
>> Paulo.
>>
>>> I hope it helps,
>>> Cheers,
>>> Luc
>>>
>>> On 07/07/11 15:50, Provenance Working Group Issue Tracker wrote:
>>> >  PROV-ISSUE-26 (uses and generates questions): How can one figure 
>>> out the provenance of a given entity?
>>> >
>>> >  http://www.w3.org/2011/prov/track/issues/26
>>> >
>>> >  Raised by: Paulo Pinheiro da Silva
>>> >  On product:
>>> >
>>> >  Context:
>>> >  1. P uses A
>>> >  2. P uses B
>>> >  3. P generates C
>>> >  4. C derived from A
>>> >
>>> >  If the provenance of C is the concern of a user of C (as opposed 
>>> to the provenance of a process that generates C), one may have the 
>>> following questions:
>>> >
>>> >  1) What the “uses” and “generates” relationships are adding to 
>>> one’s understanding of C if something is wrong with C?
>>> >  2) Can we infer that A was derived by the execution of process P? 
>>> How?
>>> >
>>> >
>>> >
>>> >
>>> >
>>
>>

Received on Friday, 5 August 2011 06:51:01 UTC