Re: PROV-ISSUE-64 (definition-use): definition of use [Conceptual Model] from Simon Miles on 2011-09-03 (public-prov-wg@w3.org from September 2011)

From: Simon Miles <simon.miles@kcl.ac.uk>
Date: Sat, 3 Sep 2011 17:20:54 +0100
To: Provenance Working Group WG <public-prov-wg@w3.org>
Message-ID: <CAKc1nHd=xqYGSE=oFtbzd7EWSQnj27Y4uN5W0A8iF0o75SgcDA@mail.gmail.com>
Hi Luc,

> I see provenance as a description of how information flows across systems....

OK, I see that as part of provenance also, but the vision you describe
in your email is more specifically one of the information flowing into
and out of well-defined ports through these systems. This may be
appropriate to get unambiguous provenance, but is quite a distance
from the intuitive answer to, for example, "what is the history of my
document?" While it is clear that roles are played in that history,
e.g. the prior version of the document, the new additions, or the
editor, it is not so clear why those roles are mandatory (not merely
informative if present), nor why they are uniquely named or ordered.

I still don't think points 2 and 3 below are resolved by your replies:

"Roles are mandatory since they allow for uniform data structures." is
not adequate to clarify the position you are coming from, as described
in your mail. The once-used tern "data structures" is ambiguous, the
need for "uniformity" is unclear in itself and in its relation to
roles being mandatory. [Apologies if this is fixed in a newer version
of the document, the W3C site seems to be at the moment]

I agree there is no way to derive role names, but there is also no way
to derive locations of entities. The issue is of explaining why the
mandatory parts are mandatory.

Thanks,
Simon

> Without roles, I am missing some crucial piece of knowledge to describe
> this information flow.
>
> In programming languages, when you describe the invocation of a procedure
> to arguments, you typically have ordered your arguments, so that you can
> match
> them to procedure parameters. In some languages (e.g. Common Lisp),
> optional
> arguments, require the parameter names to be listed explicitly.
>
> Likewise, in process algebrae, you tend to list the communication
> channels over
> which send/receive operations took place.
>
> Likewise, in some workflow languages, you identify which port values
> were sent to.
>
> Order/parameter names/channel names/ports correspond to roles in the
> provenance model.
> They are an integral part of explaining a use/generation.
>
> Of course, there may be case where the roles cannot be asserted, and the
> model
> allows a syntactic short cut, which we defined in terms of so-called
> unspecified roles.
>
> Further comments interleaved.
>
>
> On 08/31/2011 03:05 PM, Simon Miles wrote:
>> Luc,
>>
>> I've reopened this issue (and closed issue 66). I don't have an answer
>> to the issue, but I can  try to help by breaking down my concerns.
>>
>> 1. Part of the lack of clarity seems to be about the reason for
>> mandating role names. Putting myself in the shoes of a general user, I
>> would interpret something being declared mandatory as meaning it was
>> necessary to be included otherwise something critical (e.g. reasoning
>> on provenance) would not work. However, the argument below seems to be
>> that PIDM expresses assertions by fixed-length tuples ['tuple' might
>> not be quite the right term], so no element of that tuple can be
>> excluded, and so every piece of data mentioned in every definition is
>> mandatory.
>>
>> 2. The statement, "Roles are mandatory since they allow for uniform
>> data structures.", which might clarify point 1, is itself not clear.
>> "Data structures" are mentioned nowhere else in the model document, so
>> it is not clear we mean the assertion tuples. And if we are to say
>> that uniformity is critical, we have to say to who and/or why.
>> Uniformly including role names is not obviously important to those
>> wishing to assert what has occurred or to those querying provenance,
>> except in the case where they want to replay executions.
>>
>
> Now, in the latest version of document,
>  derivation relies on use/generation roles. As said above,
> i think it's crucial for describing actual information flow.
>
>
>> 3. There seems an underlying assumption that the conceptual model is
>> not extensible. There is an infinite amount of things which could be
>> included or excluded from any given assertion (i.e. are optional). For
>> example, why is role type not mandatory for used links, or location or
>> authorship not mandatory for of entities, etc. when making them so
>> would also help the data structures be more uniform? You say "If you
>> have relations without roles, you will have to define their meaning",
>> but you could equally say "If you have entities without location, you
>> will have to define their meaning" - it doesn't have to be that there
>> is an "unspecifiedLocation", just that location is one of many things
>> you have not asserted.
>>
>>
> Where is this underlying assumption coming from? What is the evidence
> that the model is not extensible? I don't see how this relate to this issue.
>
> What do you mean by "why is role type not mandatory for used links"?
> Do you really mean "role type" (as opposed to role name)?
>
> Assuming so, then it's simple. Given a process specification (which I assume
> would include role types), then given a role name in a Use statement, we can
> find the corresponding role type in the process specification.
>
> So, there is a mechanism to obtain role type from role name and process
> specification
> (like you can find a parameter type in a procedure definition, given a
> parameter name).
> There is no way of deriving a role name , I believe.
>
> Cheers,
> Luc
>
>
>
>
>> Thanks,
>> Simon
>>
>>
>>> My view here is that we define a *conceptual* model.
>>> Given serialization should make sure that shortcuts are provided.
>>> The model itself provides one, so you don't need to write unspecified0, ...
>>>
>>> Data structure? PIDM is a data model.
>>>
>>> Surely, you know the implication of optional columns in
>>> relational tables, and their implications on queries.
>>> It's the same in the semantics. If you have relations wihtout
>>> roles, you will have to define their meaning.
>>>
>>> Luc
>>>
>>> On 25/08/11 15:18, Simon Miles wrote:
>>>
>>>> Hi Luc,
>>>>
>>>> It was not me that originally raised it, but I feel that the original
>>>> issue (whether it will make sense to a reader why we require roles to
>>>> be mandatory) is not fully resolved.
>>>>
>>>> I don't dispute that replay comes up as a use case occasionally, and
>>>> maybe that's a reason to retain role names in the core model, but that
>>>> does not help explain why roles should be mandatory in all provenance.
>>>> Using "unspecified0, unspecified1, unspecified2, ..." seems ugly,
>>>> unintuitive and restrictive, and so could dissuade people from using
>>>> the standard.
>>>>
>>>> Moreover, I don't find the current model's rationale for this has
>>>> enough context to make sense:
>>>>     "Roles are mandatory since they allow for uniform data structures."
>>>> What data structures, who wants them to be uniform, and why?
>>>>
>>>> Should (can?) I re-open the issue?
>>>>
>>>> Thanks,
>>>> Simon
>>>>
>>>> On 22 August 2011 22:12, Luc Moreau<L.Moreau@ecs.soton.ac.uk>    wrote:
>>>>
>>>>
>>>>> Hi Graham,
>>>>> This issue was closed, pending review.
>>>>> Are you satisfied with the changes? Can we
>>>>> close it? Alternatively, you can reopen it,
>>>>> or create a more specific issue.
>>>>> Thanks,
>>>>> Luc
>>>>>
>>>>> PS See note on this issue's page
>>>>>
>>>>>
>>>>>
>>>>> On 29/07/11 10:13, Provenance Working Group Issue Tracker wrote:
>>>>>
>>>>>
>>>>>> PROV-ISSUE-64 (definition-use): definition of use [Conceptual Model]
>>>>>>
>>>>>> http://www.w3.org/2011/prov/track/issues/64
>>>>>>
>>>>>> Raised by: Graham Klyne
>>>>>> On product: Conceptual Model
>>>>>>
>>>>>> 5.4 Use
>>>>>>
>>>>>> Same problem with 'role' as above.
>>>>>>
>>>>>> [[
>>>>>> A reference to a given BOB may appear in multiple use assertions that refer to a given process execution, but each of those use assertions must have a distinct role.
>>>>>> ]]
>>>>>> In light of the above, this seems nonsensical to me.
>>>>>>
>>>>>> [[
>>>>>> Given an assertion uses(pe,x,r) or uses(pe,x,r,t), at least one value of x's attributes is a pre-condition for the activity denoted by pe to terminate.
>>>>>> ]]
>>>>>> As written this doesn't make sense - a value of an attribute being a precondition seems like a type error to me.  I think you mean something like availability of an attribute value.  But even that is hard to follow.  Suggest simplifying this to just:
>>>>>> [[
>>>>>> Given an assertion uses(pe,x,r) or uses(pe,x,r,t), existence of x is a pre-condition for the activity denoted by pe to terminate.
>>>>>> ]]
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> ______________________________________________________________________
>>>>> This email has been scanned by the MessageLabs Email Security System.
>>>>> For more information please visit http://www.messagelabs.com/email
>>>>> ______________________________________________________________________
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
> --
> Professor Luc Moreau
> Electronics and Computer Science   tel:   +44 23 8059 4487
> University of Southampton          fax:   +44 23 8059 2865
> Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
> United Kingdom                     http://www.ecs.soton.ac.uk/~lavm
>
>
>



-- 
Dr Simon Miles
Lecturer, Department of Informatics
Kings College London, WC2R 2LS, UK
+44 (0)20 7848 1166
Received on Saturday, 3 September 2011 16:22:13 UTC