Re: PROV-ISSUE-7 (define-derivation): Definition for Concept 'Derivation' [Provenance Terminology] from Paul Groth on 2011-06-06 (public-prov-wg@w3.org from June 2011)

From: Paul Groth <pgroth@gmail.com>
Date: Mon, 6 Jun 2011 12:21:47 +0200
To: Graham Klyne <GK@ninebynine.org>
Cc: Provenance Working Group WG <public-prov-wg@w3.org>
Message-ID: <BANLkTim+O0EeXk9q_9gptdcUxOe9i0v45A@mail.gmail.com>
Hi Graham,

>From my understanding OPM doesn't say anything about the prospective.
An account is a coloring of the graph with some operation on that
coloring.

It doesn't say who or what an account is from.

cheers,
Paul


On Mon, Jun 6, 2011 at 9:01 AM, Graham Klyne <GK@ninebynine.org> wrote:
> I'm wondering if the use of "account" here is exactly the same as the use of
> "account" in OPM.  I guess Luc would know best.
>
> Specifically, when we talk of or compare multiple accounts of some process
> of information production, do we require them to all be from the same
> perspective? I think that may be what OPM assumes.  Maybe it doesn't matter,
> but if there's scope for confusion I figure we should at least be aware of
> it.
>
> #g
> --
>
> Simon Miles wrote:
>>
>> I think "invariant" is good too.
>>
>> I was unclear, regarding the proposal to focus on "values/things that
>> are immutable according to some perspective or viewpoint", whether it
>> is the latter "values" for which we determine provenance or state
>> derivation relationships, or whether the "values" are properties of
>> the entities which have provenance and there other mutable (variant)
>> values?
>>
>> If only for my own understanding, I tried looking across the different
>> threads on this list. Here's my interpretation of what has been
>> implied in terms of definitions (but I might well be misinterpreting).
>>
>> An entity is something identifiable.
>> An account is a record of something that has occurred from a
>> particular perspective.
>> An invariant property of an entity is a property of that entity which
>> is invariant according to a particular perspective.
>> An abstraction of an entity is another entity with a subset of its
>> invariant properties, according to a particular perspective.
>> B derives from A if some of B's invariant properties are due to A's
>> invariant properties.
>>
>> An example trying to capture all the above:
>>
>> Entities:
>>  - E1: A government data set with UK government identifier GOVID-12345
>>  - E2: The data set with a data value for row 2012 being £7,500
>>  - E3: The corrected data set with the value for row 2012 being £9,000
>>  - E4: An Excel 2010 spreadsheet containing the corrected data set
>>
>> Accounts:
>>  - A1: An account from a perspective in which any government data set
>> will always retain the same UK government identifier (a new identifier
>> means a new data set)
>>  - A2: An account from a perspective in which any change of value in a
>> data set means it is a new version of that data set
>>  - A3: An account from a perspective in which any changes to a file by
>> writing create a new data set, while any changes due to reading do not
>>
>> Invariant properties:
>>  - P1: Identifier GOVID-12345 is invariant for E1, E2, E3, E4 with
>> respect to account A1
>>  - P2: All the data values (including £7,500 for 2012) are invariant
>> for E2 with respect to account A2
>>  - P3: All the data values (including £9,000 for 2012) are invariant
>> for E3, E4 with respect to account A2
>>  - P4: All bytes of the spreadsheet are invariant for E4 except those
>> changed on reading (e.g. Excel saves the current open worksheet,
>> cursor position etc. even without editing) with respect to account A3
>>  - P5: The data set (E1) having existed is invariant for E1, E2, E3,
>> E4 with respect to any account
>>  - P6: The first version of the data set (E2) having existed is
>> invariant for E2 with respect to any account
>>  - P7: The corrected version of the data set (E3) having existed is
>> invariant for E3, E4 with respect to any account
>>  - P8: The Excel data set (E4) having existed is invariant for E4 with
>> respect to any account
>>
>> Abstractions:
>>  - E1 abstracts E2, E3, E4
>>  - E3 abstracts E4
>>
>> Derivation:
>>  - E3 derives from E2 because, aside from the corrected value, all
>> other values are copied directly from it (P3 is partly due to P2)
>>  - E3 also derives from the correction made to the data set, changing
>> £7,500 to £9,000 (could be called E5, omitted above for brevity)
>>
>> We could then say that the provenance of an entity is/includes a
>> record of how that entity came to have its invariant properties.
>>
>> Provenance:
>>  - Provenance of E1 is how it came to be generated (P5) and came to
>> have its ID (P1)
>>  - Provenance of E2 is how it came to be generated (P5, P6), given its
>> ID (P1), and populated with the data it has (P2)
>>  - Provenance of E3 is how it came to be generated (P5, P7), given its
>> ID (P1), and populated with the data it has (P3)
>>  - Provenance of E4 is how it came to be generated (P5, P7, P8), given
>> its ID (P1), populated with the data it has (P3), and serialised to
>> its given bytes (P4)
>>
>> It would be good to know if others are interpreting the consensus in
>> the same way!
>>
>> Thanks,
>> Simon
>>
>> On 3 June 2011 21:36, Luc Moreau <L.Moreau@ecs.soton.ac.uk> wrote:
>>>
>>> I think I am also comfortable with using the term "invariant", if it
>>> helps gain consensus.
>>>
>>>
>>>
>>> Professor Luc Moreau
>>> Electronics and Computer Science
>>> University of Southampton
>>> Southampton SO17 1BJ
>>> United Kingdom
>>>
>>> On 3 Jun 2011, at 15:06, "Graham Klyne" <GK@ninebynine.org> wrote:
>>>
>>>> Luc,
>>>> Jim,
>>>> Khalid,
>>>>
>>>> I'm responding to all of you at once.
>>>>
>>>> Short answer: what Luc says.
>>>>
>>>> I find myself preferring the term "invariant" to "immutable" for just
>>>> this reason.
>>>>
>>>> ...
>>>>
>>>> Longer answer:  there's not a specific thing I want to capture through
>>>> derivation of mutual resources.  I'm just concerned that insisting on
>>>> immutability may prevent useful expression.
>>>>
>>>> I'll illustrate with an example from a completely different field.  For
>>>> some years, I have been involved peripherally in definition and registration
>>>> of URI schemes, and remain IANA's designated reviewer for new URI schemes.
>>>>  Several years ago, there's was much discussion about registering new URI
>>>> schemes vs registering new URN namespaces [2] vs using http URIs for
>>>> everything.  A specific example is the info: URI scheme [3].  I argued at
>>>> the time that this could equally served by a URN namespace.  But the
>>>> original definition of URN requirements [4] made some apparently strong
>>>> assertions about persistence and permanance of URNs which the community
>>>> behind info felt were too constraining, so we ended up with an arguable
>>>> unnecessary new URI scheme. Some further history at [5].
>>>>
>>>> Looking back, I now think the original language in [4] was
>>>> over-interpreted, and many people didn't fully recognize that permanence of
>>>> identity didn't constrain the identified thing itself possibly changing or
>>>> going away. There was an expectation of immutability, not even explicitly
>>>> stated, but also not dispelled.
>>>>
>>>> This is the kind of concern I have with insisting on immutability in
>>>> subjects of provenance at the outset.
>>>>
>>>> [1] http://www.ietf.org/rfc/rfc2141.txt
>>>> [2] http://www.ietf.org/rfc/rfc2611.txt,
>>>> http://tools.ietf.org/html/rfc3406
>>>> [3] http://www.ietf.org/rfc/rfc4452.txt
>>>> [4] http://tools.ietf.org/html/rfc1737
>>>> [5] http://www.w3.org/TR/uri-clarification/
>>>>
>>>> #g
>>>> --
>>>>
>>>> Luc Moreau wrote:
>>>>>
>>>>> Hi Jim, Graham, Klyne,
>>>>> Following yesterday's call, and seeing this thread, it seems that
>>>>>  "Immutable value" is too restrictive because too absolute.
>>>>> What about saying we focus on "/values/things that are immutable
>>>>> according to some perspective or viewpoint/"?
>>>>> It seems to offer the necessary trade-off and flexibility, with
>>>>> - a stable property required for provenance
>>>>> - change being allowed according to other viewpoints.
>>>>> Cheers,
>>>>> Luc
>>>>> On 06/03/2011 02:03 AM, Myers, Jim wrote:
>>>>>>
>>>>>> What do you want to capture with derivation of mutable resources?
>>>>>> Simply that one mutable resource can be used in a process and produce
>>>>>> another different mutable resouirce? If so, I'd ask why we should consider
>>>>>> this case any different than immutable? (Does the fact that most of what we
>>>>>> want to call immutable resources are undergoing constant change (bits
>>>>>> getting refresh charges, files moving about in memory caches, etc.) cause
>>>>>> any issue with the basic OPM-style model? I think all of these cases are
>>>>>> handled just fine by OPM-style constructs and I'd argue further that the key
>>>>>> concept about artifacts was not complete immutability with respect to any
>>>>>> process we can think of but immutability with respect to the processes
>>>>>> involved in the provenance (Eggs used in cake baking do not come out as
>>>>>> modified eggs (they become a new cake), but an egg in the fridge and the
>>>>>> warmer egg waiting to be mixed are considered the same egg only because we
>>>>>> don't want to discuss/report on the wa
>
> rming process that occurred. The fact that an egg has mutability in its
> temperature doesn't make it a bad artifact in OPM or cause trouble in
> reporting a baking process...)
>>>>>>
>>>>>> The mutable case that presents a question is should we provide a
>>>>>> second mechanism to allow one to describe a process that changes the state
>>>>>> of a mutable resource?-to say that  egg with temperaturcold is the  same egg
>>>>>> with temperature warm after a heating process. I suspect that we can't avoid
>>>>>> this use case completely but we might not have to create a separate
>>>>>> mechanism: If we allow a resource egg to be associated with cold-egg and
>>>>>> warm-egg resources, we can use the OPM like mechanism (cold-egg <-- heating
>>>>>> <-- warm-egg) while adding cold-egg and warm-egg are 'aspectsof" the same
>>>>>> mutable egg which 'participates' in a heating process. I think this is
>>>>>> general and minimally disruptive. One could say that an egg participated in
>>>>>> heating without creating other resources, but one could not directly
>>>>>> describe the temperature of the egg before and after heating without
>>>>>> creating the cold and warm egg artifacts.   I think this also covers what we
>>>>>> want from agents and sources - we wan
>
> t to convey that they participate in a process and, while their state
> changes as they do so, we don't want to document their state changes. But as
> Simon says we may still want to treat them (e.g. the Royal Society) as
> resources and talk about their creation so it would be valuable if they
> could just be artifacts in the context of creation/founding type events.
> Today, we have agents and sources as different types than artifact so there
> is no way to talk about their founding, etc.
>>>>>>
>>>>>> --  Jim
>>>>>>
>>>>>> ________________________________
>>>>>>
>>>>>> From: public-prov-wg-request@w3.org
>>>>>> <mailto:public-prov-wg-request@w3.org> on behalf of Graham Klyne
>>>>>> Sent: Thu 6/2/2011 3:45 PM
>>>>>> To: Khalid Belhajjame
>>>>>> Cc: Luc Moreau; public-prov-wg@w3.org <mailto:public-prov-wg@w3.org>
>>>>>> Subject: Re: PROV-ISSUE-7 (define-derivation): Definition for Concept
>>>>>> 'Derivation' [Provenance Terminology]
>>>>>>
>>>>>>
>>>>>>
>>>>>> Khalid Belhajjame wrote:
>>>>>>
>>>>>>> Hi Graham,
>>>>>>>
>>>>>>>> I agree that many of the examples of derivation we have raised
>>>>>>>> relate
>>>>>>>
>>>>>>> to resource states.  But if, as has been suggested by myself and
>>>>>>> others,
>>>>>>> resource states are themselves resources >(especially when named for
>>>>>>> the
>>>>>>> purposes of expressing a derivation), then such derivations can
>>>>>>> equally
>>>>>>> be regarded as relating resources.  I think that's more a difference
>>>>>>> of
>>>>>>> terminology than >fundamental.
>>>>>>>
>>>>>>> Would it be fair then to say that in that view resources are
>>>>>>> immutable
>>>>>>> resources?
>>>>>>>
>>>>>> In the case of resources representing a snapshot of state, yes.
>>>>>>
>>>>>>
>>>>>>> Which bring me to the question, do we want to express derivations
>>>>>>> between mutable resources, or that is just something that we should
>>>>>>> avoid at this point?
>>>>>>>
>>>>>> (I'm finishing this email after today's telecon, so it's a bit of a
>>>>>> re-run.)
>>>>>>
>>>>>> I think that many of our use-cases are based on invariant values, and
>>>>>> the
>>>>>> near-term goal is to find expression for these.  So we definitely do
>>>>>> want to
>>>>>> express derivations between non-varying values.  But in so doing, it's
>>>>>> not clear
>>>>>> to me (yet) that we need to exclude mutable resources, so I say let's
>>>>>> keep our
>>>>>> options open and not close off any possibilities that we don't have
>>>>>> to.
>>>>>>
>>>>>> So my answer to avoiding mutable resources is: "yes and no".
>>>>>>
>>>>>> #g
>>>>>> --
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Thanks, khalid
>>>>>>>
>>>>>>>
>>>>>>>> Where I think I may diverge from what you say is that I would not
>>>>>>>> limit such expressions of derivation to resources that happen to be
>>>>>>>> a
>>>>>>>> state (or snapshot of state) of some resource.  I think defining
>>>>>>>> that
>>>>>>>> distinction in a hard-and-fast way, that also aligns with various
>>>>>>>> intuitions we may have about derivation, may prove difficult to
>>>>>>>> achieve (e.g. as I think is suggested by Jim Meyers in
>>>>>>>> http://lists.w3.org/Archives/Public/public-prov-wg/2011Jun/0015.html
>>>>>>>> (*)).
>>>>>>>>
>>>>>>>> #g
>>>>>>>> --
>>>>>>>>
>>>>>>>> (*) I just love the W3C mailing list archives - so easy to find
>>>>>>>> links
>>>>>>>> to messages, and thus capture provenance!
>>>>>>>>
>>>>>>>> Khalid Belhajjame wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> From the discussion so far on derivation it seems that most people
>>>>>>>>> tend to define derivation between resource states or resources
>>>>>>>>> state
>>>>>>>>> representations, but not for resources.
>>>>>>>>>
>>>>>>>>> My take on this is that in a context where a resource is mutable,
>>>>>>>>> derivations will mainly be used to associate resource states and
>>>>>>>>> resource states representations.
>>>>>>>>>
>>>>>>>>> That said, based on derivations connecting resource states and
>>>>>>>>> resources state representations, one can infer new derivations
>>>>>>>>> between resources. For example, consider the resource r_1 and the
>>>>>>>>> associated resource state r_1_s, and consider that r_1_s was used
>>>>>>>>> to
>>>>>>>>> construct a new resource state r_2_s, actually the first state, of
>>>>>>>>> the resource r2. We can state that r_2_s is derived from r_1_s,
>>>>>>>>> i.e.,
>>>>>>>>> r_1_s -> r_2_s. We can also state that the resource r_2 is derived
>>>>>>>>> from the resource r_1, i.e., r_1 -> r_2
>>>>>>>>>
>>>>>>>>> PS: I added a defintiion of derivation within this lines to the
>>>>>>>>> wiki:
>>>>>>>>> http://www.w3.org/2011/prov/wiki/ConceptDerivation
>>>>>>>>>
>>>>>>>>> Thanks, khalid
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 01/06/2011 07:49, Luc Moreau wrote:
>>>>>>>>>
>>>>>>>>>> Hi Graham,
>>>>>>>>>>
>>>>>>>>>> Isn't it that you used the duri scheme to name the two resource
>>>>>>>>>> states that exist in
>>>>>>>>>> this scenario?
>>>>>>>>>>
>>>>>>>>>> In your view of the web, is there a notion of stateful resource?
>>>>>>>>>> Does it apply here?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Luc
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 31/05/11 23:57, Graham Klyne wrote:
>>>>>>>>>>
>>>>>>>>>>> Luc Moreau wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Graham,
>>>>>>>>>>>>
>>>>>>>>>>>> In my example, I really mean for the two versions of the chart
>>>>>>>>>>>> to
>>>>>>>>>>>> be available at
>>>>>>>>>>>> the same URI. (So, definitely, an uncool URI!)
>>>>>>>>>>>>
>>>>>>>>>>>> In that case, there is a *single* resource, but it is stateful.
>>>>>>>>>>>> Hence, there
>>>>>>>>>>>> are two *resource states*, one generated using (stats2), and the
>>>>>>>>>>>> other using (stats3).
>>>>>>>>>>>>
>>>>>>>>>>> Luc,
>>>>>>>>>>>
>>>>>>>>>>> I had interpreted your scenario as using a common URI as you
>>>>>>>>>>> explain.
>>>>>>>>>>>
>>>>>>>>>>> But there are still several resources here, but they are not all
>>>>>>>>>>> exposed on the web or assigned URIs.  I'm appealing here to
>>>>>>>>>>> anything that *might* be identified as opposed to things that
>>>>>>>>>>> actually are assigned URIs.   (For example, the proposed duri:
>>>>>>>>>>> scheme might be used -
>>>>>>>>>>> http://tools.ietf.org/id/draft-masinter-dated-uri-07.html)
>>>>>>>>>>>
>>>>>>>>>>> (And the URI is perfectly "cool" if it is specifically intended
>>>>>>>>>>> to
>>>>>>>>>>> denote a dynamic resource.  A URI used to access the current
>>>>>>>>>>> weather in London can be stable if properly managed.)
>>>>>>>>>>>
>>>>>>>>>>> (I think this is all entirely consistent with my earlier stated
>>>>>>>>>>> positions.)
>>>>>>>>>>>
>>>>>>>>>>> #g
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Of course, if blogger had used cool uris, then, c2s2 and c2s3
>>>>>>>>>>>> would be different resources.
>>>>>>>>>>>>
>>>>>>>>>>>> Luc
>>>>>>>>>>>>
>>>>>>>>>>>> On 05/31/2011 02:25 PM, Graham Klyne wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I see (at least) two resources associated with (c2):  one
>>>>>>>>>>>>> generated using (stats2), and other using (stats3).  We might
>>>>>>>>>>>>> call these (c2s2) and (c2s3).
>>>>>>>>>>>>>
>>>>>>
>>>>>>
>>>>> --
>>>>> Professor Luc Moreau               Electronics and Computer Science
>>>>> tel:   +44 23 8059 4487         University of Southampton          fax:
>>>>> +44 23 8059 2865         Southampton SO17 1BJ               email:
>>>>> l.moreau@ecs.soton.ac.uk <mailto:l.moreau@ecs.soton.ac.uk>  United Kingdom
>>>>>                   http://www.ecs.soton.ac.uk/~lavm
>>>
>>> ______________________________________________________________________
>>> This email has been scanned by the MessageLabs Email Security System.
>>> For more information please visit http://www.messagelabs.com/email
>>> ______________________________________________________________________
>>>
>>
>>
>>
>
>
>
Received on Monday, 6 June 2011 10:22:17 UTC