Re: PROV-ISSUE-64 (definition-use): definition of use [Conceptual Model] from Luc Moreau on 2011-09-01 (public-prov-wg@w3.org from September 2011)

From: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
Date: Thu, 01 Sep 2011 14:51:38 +0100
To: public-prov-wg@w3.org
Message-ID: <EMEW3|a46e8e19073a66790cb71509922c58e1n80Epj08L.Moreau|ecs.soton.ac.uk|4E5F8DEA>
Hi Simon,

I see provenance as a description of how information flows across systems.
Without roles, I am missing some crucial piece of knowledge to describe 
this information flow.

In programming languages, when you describe the invocation of a procedure
to arguments, you typically have ordered your arguments, so that you can 
match
them to procedure parameters. In some languages (e.g. Common Lisp), 
optional
arguments, require the parameter names to be listed explicitly.

Likewise, in process algebrae, you tend to list the communication 
channels over
which send/receive operations took place.

Likewise, in some workflow languages, you identify which port values 
were sent to.

Order/parameter names/channel names/ports correspond to roles in the 
provenance model.
They are an integral part of explaining a use/generation.

Of course, there may be case where the roles cannot be asserted, and the 
model
allows a syntactic short cut, which we defined in terms of so-called 
unspecified roles.

Further comments interleaved.


On 08/31/2011 03:05 PM, Simon Miles wrote:
> Luc,
>
> I've reopened this issue (and closed issue 66). I don't have an answer
> to the issue, but I can  try to help by breaking down my concerns.
>
> 1. Part of the lack of clarity seems to be about the reason for
> mandating role names. Putting myself in the shoes of a general user, I
> would interpret something being declared mandatory as meaning it was
> necessary to be included otherwise something critical (e.g. reasoning
> on provenance) would not work. However, the argument below seems to be
> that PIDM expresses assertions by fixed-length tuples ['tuple' might
> not be quite the right term], so no element of that tuple can be
> excluded, and so every piece of data mentioned in every definition is
> mandatory.
>
> 2. The statement, "Roles are mandatory since they allow for uniform
> data structures.", which might clarify point 1, is itself not clear.
> "Data structures" are mentioned nowhere else in the model document, so
> it is not clear we mean the assertion tuples. And if we are to say
> that uniformity is critical, we have to say to who and/or why.
> Uniformly including role names is not obviously important to those
> wishing to assert what has occurred or to those querying provenance,
> except in the case where they want to replay executions.
>    

Now, in the latest version of document,
  derivation relies on use/generation roles. As said above,
i think it's crucial for describing actual information flow.


> 3. There seems an underlying assumption that the conceptual model is
> not extensible. There is an infinite amount of things which could be
> included or excluded from any given assertion (i.e. are optional). For
> example, why is role type not mandatory for used links, or location or
> authorship not mandatory for of entities, etc. when making them so
> would also help the data structures be more uniform? You say "If you
> have relations without roles, you will have to define their meaning",
> but you could equally say "If you have entities without location, you
> will have to define their meaning" - it doesn't have to be that there
> is an "unspecifiedLocation", just that location is one of many things
> you have not asserted.
>
>    
Where is this underlying assumption coming from? What is the evidence
that the model is not extensible? I don't see how this relate to this issue.

What do you mean by "why is role type not mandatory for used links"?
Do you really mean "role type" (as opposed to role name)?

Assuming so, then it's simple. Given a process specification (which I assume
would include role types), then given a role name in a Use statement, we can
find the corresponding role type in the process specification.

So, there is a mechanism to obtain role type from role name and process 
specification
(like you can find a parameter type in a procedure definition, given a 
parameter name).
There is no way of deriving a role name , I believe.

Cheers,
Luc




> Thanks,
> Simon
>
>    
>> My view here is that we define a *conceptual* model.
>> Given serialization should make sure that shortcuts are provided.
>> The model itself provides one, so you don't need to write unspecified0, ...
>>
>> Data structure? PIDM is a data model.
>>
>> Surely, you know the implication of optional columns in
>> relational tables, and their implications on queries.
>> It's the same in the semantics. If you have relations wihtout
>> roles, you will have to define their meaning.
>>
>> Luc
>>
>> On 25/08/11 15:18, Simon Miles wrote:
>>      
>>> Hi Luc,
>>>
>>> It was not me that originally raised it, but I feel that the original
>>> issue (whether it will make sense to a reader why we require roles to
>>> be mandatory) is not fully resolved.
>>>
>>> I don't dispute that replay comes up as a use case occasionally, and
>>> maybe that's a reason to retain role names in the core model, but that
>>> does not help explain why roles should be mandatory in all provenance.
>>> Using "unspecified0, unspecified1, unspecified2, ..." seems ugly,
>>> unintuitive and restrictive, and so could dissuade people from using
>>> the standard.
>>>
>>> Moreover, I don't find the current model's rationale for this has
>>> enough context to make sense:
>>>     "Roles are mandatory since they allow for uniform data structures."
>>> What data structures, who wants them to be uniform, and why?
>>>
>>> Should (can?) I re-open the issue?
>>>
>>> Thanks,
>>> Simon
>>>
>>> On 22 August 2011 22:12, Luc Moreau<L.Moreau@ecs.soton.ac.uk>    wrote:
>>>
>>>        
>>>> Hi Graham,
>>>> This issue was closed, pending review.
>>>> Are you satisfied with the changes? Can we
>>>> close it? Alternatively, you can reopen it,
>>>> or create a more specific issue.
>>>> Thanks,
>>>> Luc
>>>>
>>>> PS See note on this issue's page
>>>>
>>>>
>>>>
>>>> On 29/07/11 10:13, Provenance Working Group Issue Tracker wrote:
>>>>
>>>>          
>>>>> PROV-ISSUE-64 (definition-use): definition of use [Conceptual Model]
>>>>>
>>>>> http://www.w3.org/2011/prov/track/issues/64
>>>>>
>>>>> Raised by: Graham Klyne
>>>>> On product: Conceptual Model
>>>>>
>>>>> 5.4 Use
>>>>>
>>>>> Same problem with 'role' as above.
>>>>>
>>>>> [[
>>>>> A reference to a given BOB may appear in multiple use assertions that refer to a given process execution, but each of those use assertions must have a distinct role.
>>>>> ]]
>>>>> In light of the above, this seems nonsensical to me.
>>>>>
>>>>> [[
>>>>> Given an assertion uses(pe,x,r) or uses(pe,x,r,t), at least one value of x's attributes is a pre-condition for the activity denoted by pe to terminate.
>>>>> ]]
>>>>> As written this doesn't make sense - a value of an attribute being a precondition seems like a type error to me.  I think you mean something like availability of an attribute value.  But even that is hard to follow.  Suggest simplifying this to just:
>>>>> [[
>>>>> Given an assertion uses(pe,x,r) or uses(pe,x,r,t), existence of x is a pre-condition for the activity denoted by pe to terminate.
>>>>> ]]
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>            
>>>> ______________________________________________________________________
>>>> This email has been scanned by the MessageLabs Email Security System.
>>>> For more information please visit http://www.messagelabs.com/email
>>>> ______________________________________________________________________
>>>>
>>>>
>>>>          
>>>
>>>
>>>        
>>      
>
>
>    

-- 
Professor Luc Moreau
Electronics and Computer Science   tel:   +44 23 8059 4487
University of Southampton          fax:   +44 23 8059 2865
Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
United Kingdom                     http://www.ecs.soton.ac.uk/~lavm
Received on Thursday, 1 September 2011 13:52:14 UTC