Re: PROV-ISSUE-7 (define-derivation): Definition for Concept 'Derivation' [Provenance Terminology] from Simon Miles on 2011-06-07 (public-prov-wg@w3.org from June 2011)

From: Simon Miles <simon.miles@kcl.ac.uk>
Date: Tue, 7 Jun 2011 20:46:06 +0100
To: Provenance Working Group WG <public-prov-wg@w3.org>
Message-ID: <BANLkTimsFiFTUNgWPnErVKkBPmLHgi0xUw@mail.gmail.com>
Hello Graham,

As Paul says, perspective is not explicitly mentioned in OPM, but it
might be implied. The definition in the OPM spec is: "An account
represents a description at some level of detail as provided by one or
more observers" and, earlier in the document, accounts are said to be
"offering different levels of explanation for [a process] execution"
and that "overlapping accounts are intended to allow various
descriptions of a same execution". I would intuitively interpret the
distinction between accounts to be about perspective, particularly by
being from different "observers" or at different "levels".

With regards to comparing accounts of the same process, I would assume
they are from different perspectives, else why have multiple accounts?
I don't think there's any reason to require perspectives to be
incomparable. One OPM graph can express both a coarse-grained account
and fine-grained account of the same process, which means you could
express a query (graph traversal) using details from both accounts.

Thanks,
Simon

On 6 June 2011 11:24, Paul Groth <pgroth@gmail.com> wrote:
> Hi Graham,
>
> >From my understanding OPM doesn't say anything about the prospective.
> An account is a coloring of the graph with some operation on that
> coloring.
>
> It doesn't say who or what an account is from.
>
> cheers,
> Paul
>
>
> On Mon, Jun 6, 2011 at 9:01 AM, Graham Klyne <GK@ninebynine.org> wrote:
>> I'm wondering if the use of "account" here is exactly the same as the use of
>> "account" in OPM.  I guess Luc would know best.
>>
>> Specifically, when we talk of or compare multiple accounts of some process
>> of information production, do we require them to all be from the same
>> perspective? I think that may be what OPM assumes.  Maybe it doesn't matter,
>> but if there's scope for confusion I figure we should at least be aware of
>> it.
>>
>> #g
>> --
>>
>> Simon Miles wrote:
>>>
>>> I think "invariant" is good too.
>>>
>>> I was unclear, regarding the proposal to focus on "values/things that
>>> are immutable according to some perspective or viewpoint", whether it
>>> is the latter "values" for which we determine provenance or state
>>> derivation relationships, or whether the "values" are properties of
>>> the entities which have provenance and there other mutable (variant)
>>> values?
>>>
>>> If only for my own understanding, I tried looking across the different
>>> threads on this list. Here's my interpretation of what has been
>>> implied in terms of definitions (but I might well be misinterpreting).
>>>
>>> An entity is something identifiable.
>>> An account is a record of something that has occurred from a
>>> particular perspective.
>>> An invariant property of an entity is a property of that entity which
>>> is invariant according to a particular perspective.
>>> An abstraction of an entity is another entity with a subset of its
>>> invariant properties, according to a particular perspective.
>>> B derives from A if some of B's invariant properties are due to A's
>>> invariant properties.
>>>
>>> An example trying to capture all the above:
>>>
>>> Entities:
>>>  - E1: A government data set with UK government identifier GOVID-12345
>>>  - E2: The data set with a data value for row 2012 being £7,500
>>>  - E3: The corrected data set with the value for row 2012 being £9,000
>>>  - E4: An Excel 2010 spreadsheet containing the corrected data set
>>>
>>> Accounts:
>>>  - A1: An account from a perspective in which any government data set
>>> will always retain the same UK government identifier (a new identifier
>>> means a new data set)
>>>  - A2: An account from a perspective in which any change of value in a
>>> data set means it is a new version of that data set
>>>  - A3: An account from a perspective in which any changes to a file by
>>> writing create a new data set, while any changes due to reading do not
>>>
>>> Invariant properties:
>>>  - P1: Identifier GOVID-12345 is invariant for E1, E2, E3, E4 with
>>> respect to account A1
>>>  - P2: All the data values (including £7,500 for 2012) are invariant
>>> for E2 with respect to account A2
>>>  - P3: All the data values (including £9,000 for 2012) are invariant
>>> for E3, E4 with respect to account A2
>>>  - P4: All bytes of the spreadsheet are invariant for E4 except those
>>> changed on reading (e.g. Excel saves the current open worksheet,
>>> cursor position etc. even without editing) with respect to account A3
>>>  - P5: The data set (E1) having existed is invariant for E1, E2, E3,
>>> E4 with respect to any account
>>>  - P6: The first version of the data set (E2) having existed is
>>> invariant for E2 with respect to any account
>>>  - P7: The corrected version of the data set (E3) having existed is
>>> invariant for E3, E4 with respect to any account
>>>  - P8: The Excel data set (E4) having existed is invariant for E4 with
>>> respect to any account
>>>
>>> Abstractions:
>>>  - E1 abstracts E2, E3, E4
>>>  - E3 abstracts E4
>>>
>>> Derivation:
>>>  - E3 derives from E2 because, aside from the corrected value, all
>>> other values are copied directly from it (P3 is partly due to P2)
>>>  - E3 also derives from the correction made to the data set, changing
>>> £7,500 to £9,000 (could be called E5, omitted above for brevity)
>>>
>>> We could then say that the provenance of an entity is/includes a
>>> record of how that entity came to have its invariant properties.
>>>
>>> Provenance:
>>>  - Provenance of E1 is how it came to be generated (P5) and came to
>>> have its ID (P1)
>>>  - Provenance of E2 is how it came to be generated (P5, P6), given its
>>> ID (P1), and populated with the data it has (P2)
>>>  - Provenance of E3 is how it came to be generated (P5, P7), given its
>>> ID (P1), and populated with the data it has (P3)
>>>  - Provenance of E4 is how it came to be generated (P5, P7, P8), given
>>> its ID (P1), populated with the data it has (P3), and serialised to
>>> its given bytes (P4)
>>>
>>> It would be good to know if others are interpreting the consensus in
>>> the same way!
>>>
>>> Thanks,
>>> Simon
>>>
>>> On 3 June 2011 21:36, Luc Moreau <L.Moreau@ecs.soton.ac.uk> wrote:
>>>>
>>>> I think I am also comfortable with using the term "invariant", if it
>>>> helps gain consensus.
>>>>
>>>>
>>>>
>>>> Professor Luc Moreau
>>>> Electronics and Computer Science
>>>> University of Southampton
>>>> Southampton SO17 1BJ
>>>> United Kingdom
>>>>
>>>> On 3 Jun 2011, at 15:06, "Graham Klyne" <GK@ninebynine.org> wrote:
>>>>
>>>>> Luc,
>>>>> Jim,
>>>>> Khalid,
>>>>>
>>>>> I'm responding to all of you at once.
>>>>>
>>>>> Short answer: what Luc says.
>>>>>
>>>>> I find myself preferring the term "invariant" to "immutable" for just
>>>>> this reason.
>>>>>
>>>>> ...
>>>>>
>>>>> Longer answer:  there's not a specific thing I want to capture through
>>>>> derivation of mutual resources.  I'm just concerned that insisting on
>>>>> immutability may prevent useful expression.
>>>>>
>>>>> I'll illustrate with an example from a completely different field.  For
>>>>> some years, I have been involved peripherally in definition and registration
>>>>> of URI schemes, and remain IANA's designated reviewer for new URI schemes.
>>>>>  Several years ago, there's was much discussion about registering new URI
>>>>> schemes vs registering new URN namespaces [2] vs using http URIs for
>>>>> everything.  A specific example is the info: URI scheme [3].  I argued at
>>>>> the time that this could equally served by a URN namespace.  But the
>>>>> original definition of URN requirements [4] made some apparently strong
>>>>> assertions about persistence and permanance of URNs which the community
>>>>> behind info felt were too constraining, so we ended up with an arguable
>>>>> unnecessary new URI scheme. Some further history at [5].
>>>>>
>>>>> Looking back, I now think the original language in [4] was
>>>>> over-interpreted, and many people didn't fully recognize that permanence of
>>>>> identity didn't constrain the identified thing itself possibly changing or
>>>>> going away. There was an expectation of immutability, not even explicitly
>>>>> stated, but also not dispelled.
>>>>>
>>>>> This is the kind of concern I have with insisting on immutability in
>>>>> subjects of provenance at the outset.
>>>>>
>>>>> [1] http://www.ietf.org/rfc/rfc2141.txt
>>>>> [2] http://www.ietf.org/rfc/rfc2611.txt,
>>>>> http://tools.ietf.org/html/rfc3406
>>>>> [3] http://www.ietf.org/rfc/rfc4452.txt
>>>>> [4] http://tools.ietf.org/html/rfc1737
>>>>> [5] http://www.w3.org/TR/uri-clarification/
>>>>>
>>>>> #g
>>>>> --
>>>>>
>>>>> Luc Moreau wrote:
>>>>>>
>>>>>> Hi Jim, Graham, Klyne,
>>>>>> Following yesterday's call, and seeing this thread, it seems that
>>>>>>  "Immutable value" is too restrictive because too absolute.
>>>>>> What about saying we focus on "/values/things that are immutable
>>>>>> according to some perspective or viewpoint/"?
>>>>>> It seems to offer the necessary trade-off and flexibility, with
>>>>>> - a stable property required for provenance
>>>>>> - change being allowed according to other viewpoints.
>>>>>> Cheers,
>>>>>> Luc
>>>>>> On 06/03/2011 02:03 AM, Myers, Jim wrote:
>>>>>>>
>>>>>>> What do you want to capture with derivation of mutable resources?
>>>>>>> Simply that one mutable resource can be used in a process and produce
>>>>>>> another different mutable resouirce? If so, I'd ask why we should consider
>>>>>>> this case any different than immutable? (Does the fact that most of what we
>>>>>>> want to call immutable resources are undergoing constant change (bits
>>>>>>> getting refresh charges, files moving about in memory caches, etc.) cause
>>>>>>> any issue with the basic OPM-style model? I think all of these cases are
>>>>>>> handled just fine by OPM-style constructs and I'd argue further that the key
>>>>>>> concept about artifacts was not complete immutability with respect to any
>>>>>>> process we can think of but immutability with respect to the processes
>>>>>>> involved in the provenance (Eggs used in cake baking do not come out as
>>>>>>> modified eggs (they become a new cake), but an egg in the fridge and the
>>>>>>> warmer egg waiting to be mixed are considered the same egg only because we
>>>>>>> don't want to discuss/report on the wa
>>
>> rming process that occurred. The fact that an egg has mutability in its
>> temperature doesn't make it a bad artifact in OPM or cause trouble in
>> reporting a baking process...)
>>>>>>>
>>>>>>> The mutable case that presents a question is should we provide a
>>>>>>> second mechanism to allow one to describe a process that changes the state
>>>>>>> of a mutable resource?-to say that  egg with temperaturcold is the  same egg
>>>>>>> with temperature warm after a heating process. I suspect that we can't avoid
>>>>>>> this use case completely but we might not have to create a separate
>>>>>>> mechanism: If we allow a resource egg to be associated with cold-egg and
>>>>>>> warm-egg resources, we can use the OPM like mechanism (cold-egg <-- heating
>>>>>>> <-- warm-egg) while adding cold-egg and warm-egg are 'aspectsof" the same
>>>>>>> mutable egg which 'participates' in a heating process. I think this is
>>>>>>> general and minimally disruptive. One could say that an egg participated in
>>>>>>> heating without creating other resources, but one could not directly
>>>>>>> describe the temperature of the egg before and after heating without
>>>>>>> creating the cold and warm egg artifacts.   I think this also covers what we
>>>>>>> want from agents and sources - we wan
>>
>> t to convey that they participate in a process and, while their state
>> changes as they do so, we don't want to document their state changes. But as
>> Simon says we may still want to treat them (e.g. the Royal Society) as
>> resources and talk about their creation so it would be valuable if they
>> could just be artifacts in the context of creation/founding type events.
>> Today, we have agents and sources as different types than artifact so there
>> is no way to talk about their founding, etc.
>>>>>>>
>>>>>>> --  Jim
>>>>>>>
>>>>>>> ________________________________
>>>>>>>
>>>>>>> From: public-prov-wg-request@w3.org
>>>>>>> <mailto:public-prov-wg-request@w3.org> on behalf of Graham Klyne
>>>>>>> Sent: Thu 6/2/2011 3:45 PM
>>>>>>> To: Khalid Belhajjame
>>>>>>> Cc: Luc Moreau; public-prov-wg@w3.org <mailto:public-prov-wg@w3.org>
>>>>>>> Subject: Re: PROV-ISSUE-7 (define-derivation): Definition for Concept
>>>>>>> 'Derivation' [Provenance Terminology]
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Khalid Belhajjame wrote:
>>>>>>>
>>>>>>>> Hi Graham,
>>>>>>>>
>>>>>>>>> I agree that many of the examples of derivation we have raised
>>>>>>>>> relate
>>>>>>>>
>>>>>>>> to resource states.  But if, as has been suggested by myself and
>>>>>>>> others,
>>>>>>>> resource states are themselves resources >(especially when named for
>>>>>>>> the
>>>>>>>> purposes of expressing a derivation), then such derivations can
>>>>>>>> equally
>>>>>>>> be regarded as relating resources.  I think that's more a difference
>>>>>>>> of
>>>>>>>> terminology than >fundamental.
>>>>>>>>
>>>>>>>> Would it be fair then to say that in that view resources are
>>>>>>>> immutable
>>>>>>>> resources?
>>>>>>>>
>>>>>>> In the case of resources representing a snapshot of state, yes.
>>>>>>>
>>>>>>>
>>>>>>>> Which bring me to the question, do we want to express derivations
>>>>>>>> between mutable resources, or that is just something that we should
>>>>>>>> avoid at this point?
>>>>>>>>
>>>>>>> (I'm finishing this email after today's telecon, so it's a bit of a
>>>>>>> re-run.)
>>>>>>>
>>>>>>> I think that many of our use-cases are based on invariant values, and
>>>>>>> the
>>>>>>> near-term goal is to find expression for these.  So we definitely do
>>>>>>> want to
>>>>>>> express derivations between non-varying values.  But in so doing, it's
>>>>>>> not clear
>>>>>>> to me (yet) that we need to exclude mutable resources, so I say let's
>>>>>>> keep our
>>>>>>> options open and not close off any possibilities that we don't have
>>>>>>> to.
>>>>>>>
>>>>>>> So my answer to avoiding mutable resources is: "yes and no".
>>>>>>>
>>>>>>> #g
>>>>>>> --
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Thanks, khalid
>>>>>>>>
>>>>>>>>
>>>>>>>>> Where I think I may diverge from what you say is that I would not
>>>>>>>>> limit such expressions of derivation to resources that happen to be
>>>>>>>>> a
>>>>>>>>> state (or snapshot of state) of some resource.  I think defining
>>>>>>>>> that
>>>>>>>>> distinction in a hard-and-fast way, that also aligns with various
>>>>>>>>> intuitions we may have about derivation, may prove difficult to
>>>>>>>>> achieve (e.g. as I think is suggested by Jim Meyers in
>>>>>>>>> http://lists.w3.org/Archives/Public/public-prov-wg/2011Jun/0015.html
>>>>>>>>> (*)).
>>>>>>>>>
>>>>>>>>> #g
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> (*) I just love the W3C mailing list archives - so easy to find
>>>>>>>>> links
>>>>>>>>> to messages, and thus capture provenance!
>>>>>>>>>
>>>>>>>>> Khalid Belhajjame wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> From the discussion so far on derivation it seems that most people
>>>>>>>>>> tend to define derivation between resource states or resources
>>>>>>>>>> state
>>>>>>>>>> representations, but not for resources.
>>>>>>>>>>
>>>>>>>>>> My take on this is that in a context where a resource is mutable,
>>>>>>>>>> derivations will mainly be used to associate resource states and
>>>>>>>>>> resource states representations.
>>>>>>>>>>
>>>>>>>>>> That said, based on derivations connecting resource states and
>>>>>>>>>> resources state representations, one can infer new derivations
>>>>>>>>>> between resources. For example, consider the resource r_1 and the
>>>>>>>>>> associated resource state r_1_s, and consider that r_1_s was used
>>>>>>>>>> to
>>>>>>>>>> construct a new resource state r_2_s, actually the first state, of
>>>>>>>>>> the resource r2. We can state that r_2_s is derived from r_1_s,
>>>>>>>>>> i.e.,
>>>>>>>>>> r_1_s -> r_2_s. We can also state that the resource r_2 is derived
>>>>>>>>>> from the resource r_1, i.e., r_1 -> r_2
>>>>>>>>>>
>>>>>>>>>> PS: I added a defintiion of derivation within this lines to the
>>>>>>>>>> wiki:
>>>>>>>>>> http://www.w3.org/2011/prov/wiki/ConceptDerivation
>>>>>>>>>>
>>>>>>>>>> Thanks, khalid
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 01/06/2011 07:49, Luc Moreau wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Graham,
>>>>>>>>>>>
>>>>>>>>>>> Isn't it that you used the duri scheme to name the two resource
>>>>>>>>>>> states that exist in
>>>>>>>>>>> this scenario?
>>>>>>>>>>>
>>>>>>>>>>> In your view of the web, is there a notion of stateful resource?
>>>>>>>>>>> Does it apply here?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Luc
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 31/05/11 23:57, Graham Klyne wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Luc Moreau wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Graham,
>>>>>>>>>>>>>
>>>>>>>>>>>>> In my example, I really mean for the two versions of the chart
>>>>>>>>>>>>> to
>>>>>>>>>>>>> be available at
>>>>>>>>>>>>> the same URI. (So, definitely, an uncool URI!)
>>>>>>>>>>>>>
>>>>>>>>>>>>> In that case, there is a *single* resource, but it is stateful.
>>>>>>>>>>>>> Hence, there
>>>>>>>>>>>>> are two *resource states*, one generated using (stats2), and the
>>>>>>>>>>>>> other using (stats3).
>>>>>>>>>>>>>
>>>>>>>>>>>> Luc,
>>>>>>>>>>>>
>>>>>>>>>>>> I had interpreted your scenario as using a common URI as you
>>>>>>>>>>>> explain.
>>>>>>>>>>>>
>>>>>>>>>>>> But there are still several resources here, but they are not all
>>>>>>>>>>>> exposed on the web or assigned URIs.  I'm appealing here to
>>>>>>>>>>>> anything that *might* be identified as opposed to things that
>>>>>>>>>>>> actually are assigned URIs.   (For example, the proposed duri:
>>>>>>>>>>>> scheme might be used -
>>>>>>>>>>>> http://tools.ietf.org/id/draft-masinter-dated-uri-07.html)
>>>>>>>>>>>>
>>>>>>>>>>>> (And the URI is perfectly "cool" if it is specifically intended
>>>>>>>>>>>> to
>>>>>>>>>>>> denote a dynamic resource.  A URI used to access the current
>>>>>>>>>>>> weather in London can be stable if properly managed.)
>>>>>>>>>>>>
>>>>>>>>>>>> (I think this is all entirely consistent with my earlier stated
>>>>>>>>>>>> positions.)
>>>>>>>>>>>>
>>>>>>>>>>>> #g
>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Of course, if blogger had used cool uris, then, c2s2 and c2s3
>>>>>>>>>>>>> would be different resources.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Luc
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 05/31/2011 02:25 PM, Graham Klyne wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I see (at least) two resources associated with (c2):  one
>>>>>>>>>>>>>> generated using (stats2), and other using (stats3).  We might
>>>>>>>>>>>>>> call these (c2s2) and (c2s3).
>>>>>>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> Professor Luc Moreau               Electronics and Computer Science
>>>>>> tel:   +44 23 8059 4487         University of Southampton          fax:
>>>>>> +44 23 8059 2865         Southampton SO17 1BJ               email:
>>>>>> l.moreau@ecs.soton.ac.uk <mailto:l.moreau@ecs.soton.ac.uk>  United Kingdom
>>>>>>                   http://www.ecs.soton.ac.uk/~lavm
>>>>
>>>> ______________________________________________________________________
>>>> This email has been scanned by the MessageLabs Email Security System.
>>>> For more information please visit http://www.messagelabs.com/email
>>>> ______________________________________________________________________
>>>>
>>>
>>>
>>>
>>
>>
>>
>
>
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________
>



-- 
Dr Simon Miles
Lecturer, Department of Informatics
Kings College London, WC2R 2LS, UK
+44 (0)20 7848 1166
Received on Tuesday, 7 June 2011 19:46:35 UTC