W3C home > Mailing lists > Public > public-prov-wg@w3.org > June 2011

Re: PROV-ISSUE-7 (define-derivation): Definition for Concept 'Derivation' [Provenance Terminology]

From: Graham Klyne <GK@ninebynine.org>
Date: Wed, 08 Jun 2011 08:19:21 +0100
Message-ID: <4DEF2279.5090507@ninebynine.org>
To: Simon Miles <simon.miles@kcl.ac.uk>
CC: Provenance Working Group WG <public-prov-wg@w3.org>
Hi Simon,

Simon Miles wrote:
> Hello Graham,
> 
> As Paul says, perspective is not explicitly mentioned in OPM, but it
> might be implied. The definition in the OPM spec is: "An account
> represents a description at some level of detail as provided by one or
> more observers" and, earlier in the document, accounts are said to be
> "offering different levels of explanation for [a process] execution"
> and that "overlapping accounts are intended to allow various
> descriptions of a same execution". I would intuitively interpret the
> distinction between accounts to be about perspective, particularly by
> being from different "observers" or at different "levels".

Yes, this is what I was thinking.  It does not explicitly allow for accounts of 
different "perspectives" (*) - the implication being different levels of 
granularity on the same perspective (i.e. the same underlying information, or 
the same "invariant").  All the examples I recall were likewise directed.

(*) by my understanding of "perspective", which is a consideration of what 
information is considered relevant for some particular purpose.

> With regards to comparing accounts of the same process, I would assume
> they are from different perspectives, else why have multiple accounts?
> I don't think there's any reason to require perspectives to be
> incomparable. One OPM graph can express both a coarse-grained account
> and fine-grained account of the same process, which means you could
> express a query (graph traversal) using details from both accounts.

Suppose we're talking about the provenance of a document - when it was authored, 
by whom, etc.

If the author edits the document, from an authorship perspective the proveamnce 
does not change, but from a temporal perspective it does.  This kind of 
difference is not covered by simply different levels of detail.

This is why I say that I think that OPM implicitly assumes a common perspective.

This is mainly to test my understanding of the OPM concept - I'm not sure that 
this is especially important for the current work items (thoug may become so later).

#g
--
> On 6 June 2011 11:24, Paul Groth <pgroth@gmail.com> wrote:
>> Hi Graham,
>>
>> >From my understanding OPM doesn't say anything about the prospective.
>> An account is a coloring of the graph with some operation on that
>> coloring.
>>
>> It doesn't say who or what an account is from.
>>
>> cheers,
>> Paul
>>
>>
>> On Mon, Jun 6, 2011 at 9:01 AM, Graham Klyne <GK@ninebynine.org> wrote:
>>> I'm wondering if the use of "account" here is exactly the same as the use of
>>> "account" in OPM.  I guess Luc would know best.
>>>
>>> Specifically, when we talk of or compare multiple accounts of some process
>>> of information production, do we require them to all be from the same
>>> perspective? I think that may be what OPM assumes.  Maybe it doesn't matter,
>>> but if there's scope for confusion I figure we should at least be aware of
>>> it.
>>>
>>> #g
>>> --
>>>
>>> Simon Miles wrote:
>>>> I think "invariant" is good too.
>>>>
>>>> I was unclear, regarding the proposal to focus on "values/things that
>>>> are immutable according to some perspective or viewpoint", whether it
>>>> is the latter "values" for which we determine provenance or state
>>>> derivation relationships, or whether the "values" are properties of
>>>> the entities which have provenance and there other mutable (variant)
>>>> values?
>>>>
>>>> If only for my own understanding, I tried looking across the different
>>>> threads on this list. Here's my interpretation of what has been
>>>> implied in terms of definitions (but I might well be misinterpreting).
>>>>
>>>> An entity is something identifiable.
>>>> An account is a record of something that has occurred from a
>>>> particular perspective.
>>>> An invariant property of an entity is a property of that entity which
>>>> is invariant according to a particular perspective.
>>>> An abstraction of an entity is another entity with a subset of its
>>>> invariant properties, according to a particular perspective.
>>>> B derives from A if some of B's invariant properties are due to A's
>>>> invariant properties.
>>>>
>>>> An example trying to capture all the above:
>>>>
>>>> Entities:
>>>>  - E1: A government data set with UK government identifier GOVID-12345
>>>>  - E2: The data set with a data value for row 2012 being 7,500
>>>>  - E3: The corrected data set with the value for row 2012 being 9,000
>>>>  - E4: An Excel 2010 spreadsheet containing the corrected data set
>>>>
>>>> Accounts:
>>>>  - A1: An account from a perspective in which any government data set
>>>> will always retain the same UK government identifier (a new identifier
>>>> means a new data set)
>>>>  - A2: An account from a perspective in which any change of value in a
>>>> data set means it is a new version of that data set
>>>>  - A3: An account from a perspective in which any changes to a file by
>>>> writing create a new data set, while any changes due to reading do not
>>>>
>>>> Invariant properties:
>>>>  - P1: Identifier GOVID-12345 is invariant for E1, E2, E3, E4 with
>>>> respect to account A1
>>>>  - P2: All the data values (including 7,500 for 2012) are invariant
>>>> for E2 with respect to account A2
>>>>  - P3: All the data values (including 9,000 for 2012) are invariant
>>>> for E3, E4 with respect to account A2
>>>>  - P4: All bytes of the spreadsheet are invariant for E4 except those
>>>> changed on reading (e.g. Excel saves the current open worksheet,
>>>> cursor position etc. even without editing) with respect to account A3
>>>>  - P5: The data set (E1) having existed is invariant for E1, E2, E3,
>>>> E4 with respect to any account
>>>>  - P6: The first version of the data set (E2) having existed is
>>>> invariant for E2 with respect to any account
>>>>  - P7: The corrected version of the data set (E3) having existed is
>>>> invariant for E3, E4 with respect to any account
>>>>  - P8: The Excel data set (E4) having existed is invariant for E4 with
>>>> respect to any account
>>>>
>>>> Abstractions:
>>>>  - E1 abstracts E2, E3, E4
>>>>  - E3 abstracts E4
>>>>
>>>> Derivation:
>>>>  - E3 derives from E2 because, aside from the corrected value, all
>>>> other values are copied directly from it (P3 is partly due to P2)
>>>>  - E3 also derives from the correction made to the data set, changing
>>>> 7,500 to 9,000 (could be called E5, omitted above for brevity)
>>>>
>>>> We could then say that the provenance of an entity is/includes a
>>>> record of how that entity came to have its invariant properties.
>>>>
>>>> Provenance:
>>>>  - Provenance of E1 is how it came to be generated (P5) and came to
>>>> have its ID (P1)
>>>>  - Provenance of E2 is how it came to be generated (P5, P6), given its
>>>> ID (P1), and populated with the data it has (P2)
>>>>  - Provenance of E3 is how it came to be generated (P5, P7), given its
>>>> ID (P1), and populated with the data it has (P3)
>>>>  - Provenance of E4 is how it came to be generated (P5, P7, P8), given
>>>> its ID (P1), populated with the data it has (P3), and serialised to
>>>> its given bytes (P4)
>>>>
>>>> It would be good to know if others are interpreting the consensus in
>>>> the same way!
>>>>
>>>> Thanks,
>>>> Simon
>>>>
>>>> On 3 June 2011 21:36, Luc Moreau <L.Moreau@ecs.soton.ac.uk> wrote:
>>>>> I think I am also comfortable with using the term "invariant", if it
>>>>> helps gain consensus.
>>>>>
>>>>>
>>>>>
>>>>> Professor Luc Moreau
>>>>> Electronics and Computer Science
>>>>> University of Southampton
>>>>> Southampton SO17 1BJ
>>>>> United Kingdom
>>>>>
>>>>> On 3 Jun 2011, at 15:06, "Graham Klyne" <GK@ninebynine.org> wrote:
>>>>>
>>>>>> Luc,
>>>>>> Jim,
>>>>>> Khalid,
>>>>>>
>>>>>> I'm responding to all of you at once.
>>>>>>
>>>>>> Short answer: what Luc says.
>>>>>>
>>>>>> I find myself preferring the term "invariant" to "immutable" for just
>>>>>> this reason.
>>>>>>
>>>>>> ...
>>>>>>
>>>>>> Longer answer:  there's not a specific thing I want to capture through
>>>>>> derivation of mutual resources.  I'm just concerned that insisting on
>>>>>> immutability may prevent useful expression.
>>>>>>
>>>>>> I'll illustrate with an example from a completely different field.  For
>>>>>> some years, I have been involved peripherally in definition and registration
>>>>>> of URI schemes, and remain IANA's designated reviewer for new URI schemes.
>>>>>>  Several years ago, there's was much discussion about registering new URI
>>>>>> schemes vs registering new URN namespaces [2] vs using http URIs for
>>>>>> everything.  A specific example is the info: URI scheme [3].  I argued at
>>>>>> the time that this could equally served by a URN namespace.  But the
>>>>>> original definition of URN requirements [4] made some apparently strong
>>>>>> assertions about persistence and permanance of URNs which the community
>>>>>> behind info felt were too constraining, so we ended up with an arguable
>>>>>> unnecessary new URI scheme. Some further history at [5].
>>>>>>
>>>>>> Looking back, I now think the original language in [4] was
>>>>>> over-interpreted, and many people didn't fully recognize that permanence of
>>>>>> identity didn't constrain the identified thing itself possibly changing or
>>>>>> going away. There was an expectation of immutability, not even explicitly
>>>>>> stated, but also not dispelled.
>>>>>>
>>>>>> This is the kind of concern I have with insisting on immutability in
>>>>>> subjects of provenance at the outset.
>>>>>>
>>>>>> [1] http://www.ietf.org/rfc/rfc2141.txt
>>>>>> [2] http://www.ietf.org/rfc/rfc2611.txt,
>>>>>> http://tools.ietf.org/html/rfc3406
>>>>>> [3] http://www.ietf.org/rfc/rfc4452.txt
>>>>>> [4] http://tools.ietf.org/html/rfc1737
>>>>>> [5] http://www.w3.org/TR/uri-clarification/
>>>>>>
>>>>>> #g
>>>>>> --
>>>>>>
>>>>>> Luc Moreau wrote:
>>>>>>> Hi Jim, Graham, Klyne,
>>>>>>> Following yesterday's call, and seeing this thread, it seems that
>>>>>>>  "Immutable value" is too restrictive because too absolute.
>>>>>>> What about saying we focus on "/values/things that are immutable
>>>>>>> according to some perspective or viewpoint/"?
>>>>>>> It seems to offer the necessary trade-off and flexibility, with
>>>>>>> - a stable property required for provenance
>>>>>>> - change being allowed according to other viewpoints.
>>>>>>> Cheers,
>>>>>>> Luc
>>>>>>> On 06/03/2011 02:03 AM, Myers, Jim wrote:
>>>>>>>> What do you want to capture with derivation of mutable resources?
>>>>>>>> Simply that one mutable resource can be used in a process and produce
>>>>>>>> another different mutable resouirce? If so, I'd ask why we should consider
>>>>>>>> this case any different than immutable? (Does the fact that most of what we
>>>>>>>> want to call immutable resources are undergoing constant change (bits
>>>>>>>> getting refresh charges, files moving about in memory caches, etc.) cause
>>>>>>>> any issue with the basic OPM-style model? I think all of these cases are
>>>>>>>> handled just fine by OPM-style constructs and I'd argue further that the key
>>>>>>>> concept about artifacts was not complete immutability with respect to any
>>>>>>>> process we can think of but immutability with respect to the processes
>>>>>>>> involved in the provenance (Eggs used in cake baking do not come out as
>>>>>>>> modified eggs (they become a new cake), but an egg in the fridge and the
>>>>>>>> warmer egg waiting to be mixed are considered the same egg only because we
>>>>>>>> don't want to discuss/report on the wa
>>> rming process that occurred. The fact that an egg has mutability in its
>>> temperature doesn't make it a bad artifact in OPM or cause trouble in
>>> reporting a baking process...)
>>>>>>>> The mutable case that presents a question is should we provide a
>>>>>>>> second mechanism to allow one to describe a process that changes the state
>>>>>>>> of a mutable resource?-to say that  egg with temperaturcold is the  same egg
>>>>>>>> with temperature warm after a heating process. I suspect that we can't avoid
>>>>>>>> this use case completely but we might not have to create a separate
>>>>>>>> mechanism: If we allow a resource egg to be associated with cold-egg and
>>>>>>>> warm-egg resources, we can use the OPM like mechanism (cold-egg <-- heating
>>>>>>>> <-- warm-egg) while adding cold-egg and warm-egg are 'aspectsof" the same
>>>>>>>> mutable egg which 'participates' in a heating process. I think this is
>>>>>>>> general and minimally disruptive. One could say that an egg participated in
>>>>>>>> heating without creating other resources, but one could not directly
>>>>>>>> describe the temperature of the egg before and after heating without
>>>>>>>> creating the cold and warm egg artifacts.   I think this also covers what we
>>>>>>>> want from agents and sources - we wan
>>> t to convey that they participate in a process and, while their state
>>> changes as they do so, we don't want to document their state changes. But as
>>> Simon says we may still want to treat them (e.g. the Royal Society) as
>>> resources and talk about their creation so it would be valuable if they
>>> could just be artifacts in the context of creation/founding type events.
>>> Today, we have agents and sources as different types than artifact so there
>>> is no way to talk about their founding, etc.
>>>>>>>> --  Jim
>>>>>>>>
>>>>>>>> ________________________________
>>>>>>>>
>>>>>>>> From: public-prov-wg-request@w3.org
>>>>>>>> <mailto:public-prov-wg-request@w3.org> on behalf of Graham Klyne
>>>>>>>> Sent: Thu 6/2/2011 3:45 PM
>>>>>>>> To: Khalid Belhajjame
>>>>>>>> Cc: Luc Moreau; public-prov-wg@w3.org <mailto:public-prov-wg@w3.org>
>>>>>>>> Subject: Re: PROV-ISSUE-7 (define-derivation): Definition for Concept
>>>>>>>> 'Derivation' [Provenance Terminology]
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Khalid Belhajjame wrote:
>>>>>>>>
>>>>>>>>> Hi Graham,
>>>>>>>>>
>>>>>>>>>> I agree that many of the examples of derivation we have raised
>>>>>>>>>> relate
>>>>>>>>> to resource states.  But if, as has been suggested by myself and
>>>>>>>>> others,
>>>>>>>>> resource states are themselves resources >(especially when named for
>>>>>>>>> the
>>>>>>>>> purposes of expressing a derivation), then such derivations can
>>>>>>>>> equally
>>>>>>>>> be regarded as relating resources.  I think that's more a difference
>>>>>>>>> of
>>>>>>>>> terminology than >fundamental.
>>>>>>>>>
>>>>>>>>> Would it be fair then to say that in that view resources are
>>>>>>>>> immutable
>>>>>>>>> resources?
>>>>>>>>>
>>>>>>>> In the case of resources representing a snapshot of state, yes.
>>>>>>>>
>>>>>>>>
>>>>>>>>> Which bring me to the question, do we want to express derivations
>>>>>>>>> between mutable resources, or that is just something that we should
>>>>>>>>> avoid at this point?
>>>>>>>>>
>>>>>>>> (I'm finishing this email after today's telecon, so it's a bit of a
>>>>>>>> re-run.)
>>>>>>>>
>>>>>>>> I think that many of our use-cases are based on invariant values, and
>>>>>>>> the
>>>>>>>> near-term goal is to find expression for these.  So we definitely do
>>>>>>>> want to
>>>>>>>> express derivations between non-varying values.  But in so doing, it's
>>>>>>>> not clear
>>>>>>>> to me (yet) that we need to exclude mutable resources, so I say let's
>>>>>>>> keep our
>>>>>>>> options open and not close off any possibilities that we don't have
>>>>>>>> to.
>>>>>>>>
>>>>>>>> So my answer to avoiding mutable resources is: "yes and no".
>>>>>>>>
>>>>>>>> #g
>>>>>>>> --
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Thanks, khalid
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Where I think I may diverge from what you say is that I would not
>>>>>>>>>> limit such expressions of derivation to resources that happen to be
>>>>>>>>>> a
>>>>>>>>>> state (or snapshot of state) of some resource.  I think defining
>>>>>>>>>> that
>>>>>>>>>> distinction in a hard-and-fast way, that also aligns with various
>>>>>>>>>> intuitions we may have about derivation, may prove difficult to
>>>>>>>>>> achieve (e.g. as I think is suggested by Jim Meyers in
>>>>>>>>>> http://lists.w3.org/Archives/Public/public-prov-wg/2011Jun/0015.html
>>>>>>>>>> (*)).
>>>>>>>>>>
>>>>>>>>>> #g
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> (*) I just love the W3C mailing list archives - so easy to find
>>>>>>>>>> links
>>>>>>>>>> to messages, and thus capture provenance!
>>>>>>>>>>
>>>>>>>>>> Khalid Belhajjame wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> From the discussion so far on derivation it seems that most people
>>>>>>>>>>> tend to define derivation between resource states or resources
>>>>>>>>>>> state
>>>>>>>>>>> representations, but not for resources.
>>>>>>>>>>>
>>>>>>>>>>> My take on this is that in a context where a resource is mutable,
>>>>>>>>>>> derivations will mainly be used to associate resource states and
>>>>>>>>>>> resource states representations.
>>>>>>>>>>>
>>>>>>>>>>> That said, based on derivations connecting resource states and
>>>>>>>>>>> resources state representations, one can infer new derivations
>>>>>>>>>>> between resources. For example, consider the resource r_1 and the
>>>>>>>>>>> associated resource state r_1_s, and consider that r_1_s was used
>>>>>>>>>>> to
>>>>>>>>>>> construct a new resource state r_2_s, actually the first state, of
>>>>>>>>>>> the resource r2. We can state that r_2_s is derived from r_1_s,
>>>>>>>>>>> i.e.,
>>>>>>>>>>> r_1_s -> r_2_s. We can also state that the resource r_2 is derived
>>>>>>>>>>> from the resource r_1, i.e., r_1 -> r_2
>>>>>>>>>>>
>>>>>>>>>>> PS: I added a defintiion of derivation within this lines to the
>>>>>>>>>>> wiki:
>>>>>>>>>>> http://www.w3.org/2011/prov/wiki/ConceptDerivation
>>>>>>>>>>>
>>>>>>>>>>> Thanks, khalid
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 01/06/2011 07:49, Luc Moreau wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Graham,
>>>>>>>>>>>>
>>>>>>>>>>>> Isn't it that you used the duri scheme to name the two resource
>>>>>>>>>>>> states that exist in
>>>>>>>>>>>> this scenario?
>>>>>>>>>>>>
>>>>>>>>>>>> In your view of the web, is there a notion of stateful resource?
>>>>>>>>>>>> Does it apply here?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Luc
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 31/05/11 23:57, Graham Klyne wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Luc Moreau wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Graham,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In my example, I really mean for the two versions of the chart
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> be available at
>>>>>>>>>>>>>> the same URI. (So, definitely, an uncool URI!)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In that case, there is a *single* resource, but it is stateful.
>>>>>>>>>>>>>> Hence, there
>>>>>>>>>>>>>> are two *resource states*, one generated using (stats2), and the
>>>>>>>>>>>>>> other using (stats3).
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Luc,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I had interpreted your scenario as using a common URI as you
>>>>>>>>>>>>> explain.
>>>>>>>>>>>>>
>>>>>>>>>>>>> But there are still several resources here, but they are not all
>>>>>>>>>>>>> exposed on the web or assigned URIs.  I'm appealing here to
>>>>>>>>>>>>> anything that *might* be identified as opposed to things that
>>>>>>>>>>>>> actually are assigned URIs.   (For example, the proposed duri:
>>>>>>>>>>>>> scheme might be used -
>>>>>>>>>>>>> http://tools.ietf.org/id/draft-masinter-dated-uri-07.html)
>>>>>>>>>>>>>
>>>>>>>>>>>>> (And the URI is perfectly "cool" if it is specifically intended
>>>>>>>>>>>>> to
>>>>>>>>>>>>> denote a dynamic resource.  A URI used to access the current
>>>>>>>>>>>>> weather in London can be stable if properly managed.)
>>>>>>>>>>>>>
>>>>>>>>>>>>> (I think this is all entirely consistent with my earlier stated
>>>>>>>>>>>>> positions.)
>>>>>>>>>>>>>
>>>>>>>>>>>>> #g
>>>>>>>>>>>>> --
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Of course, if blogger had used cool uris, then, c2s2 and c2s3
>>>>>>>>>>>>>> would be different resources.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Luc
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 05/31/2011 02:25 PM, Graham Klyne wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I see (at least) two resources associated with (c2):  one
>>>>>>>>>>>>>>> generated using (stats2), and other using (stats3).  We might
>>>>>>>>>>>>>>> call these (c2s2) and (c2s3).
>>>>>>>>>>>>>>>
>>>>>>>>
>>>>>>> --
>>>>>>> Professor Luc Moreau               Electronics and Computer Science
>>>>>>> tel:   +44 23 8059 4487         University of Southampton          fax:
>>>>>>> +44 23 8059 2865         Southampton SO17 1BJ               email:
>>>>>>> l.moreau@ecs.soton.ac.uk <mailto:l.moreau@ecs.soton.ac.uk>  United Kingdom
>>>>>>>                   http://www.ecs.soton.ac.uk/~lavm
>>>>> ______________________________________________________________________
>>>>> This email has been scanned by the MessageLabs Email Security System.
>>>>> For more information please visit http://www.messagelabs.com/email
>>>>> ______________________________________________________________________
>>>>>
>>>>
>>>>
>>>
>>>
>>
>> ______________________________________________________________________
>> This email has been scanned by the MessageLabs Email Security System.
>> For more information please visit http://www.messagelabs.com/email
>> ______________________________________________________________________
>>
> 
> 
> 
Received on Wednesday, 8 June 2011 11:12:54 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 13:06:31 GMT