Re: PROV-DM derivation concerns arising from my primer review from Simon Miles on 2011-11-25 (public-prov-wg@w3.org from November 2011)

From: Simon Miles <simon.miles@kcl.ac.uk>
Date: Fri, 25 Nov 2011 11:01:18 +0000
To: Provenance Working Group WG <public-prov-wg@w3.org>
Message-ID: <CAKc1nHeDkbmfju=-5L=pYCeccPFfnBUmkv38u-yuukWBBCjq+g@mail.gmail.com>
Hi Graham,

> directlyDerivedFrom is assertable based on the observation of a process that
> consumes one and generates the other.  (We don't know if the one materially
> affects the value of the other, but that's not what I thought direct derivation
> would be saying.)

I'm pretty sure the intent was to say more than this. Why would we
want to assert this? As I believe Luc has argued in the past, if an
activity generates B and later uses A, it is not informative to say
that B derives from A, but this is just an extreme example where lack
of affect is most evident.

> I think it's the nature of the beast that there is little that can be inferred
> from a very generic framework.

I completely agree, but I'm not arguing for inference, just assertion
of derivation.

> So while I agree with your comment about the formal nature of a weak derivation
> relationship, I see its value is in the intent that is signals (informally), and
> also that it's a base from which more meaningful derivations can be, er, derived
> - though specialization.  And each meaningful derivation property (e.g.
> quotation) may have different formal properties.

I understand, but I'm not sure why we would have any general
derivation relation at all in that case. It seems to say so little
(unlike, say, wasGeneratedBy) that saying containsQuoteFrom
specialises wasDerivedFrom seems to say nothing at all. The weak
definition means the two entities related by wasDerivedFrom may have
no more connection than two arbitrary entities, surely?

> <aside>
> A similar argument could be made for first order logic.
>
> In isolation, it expresses nothing - just mathematical structures, albeit more
> complex than transitive closures, but fundamentally no more meaningful.
>
> Its value comes from being a framework in which terms can be associated concepts
> (or things, or values) by extra-logical means, and to hence show how such things
> must be related if they conform to certain logically expressed patterns.
> </aside>

Well, yes, but those patterns place constraints, which wasDerivedFrom
does not seem to. Moreover, PROV-DM in all other cases goes beyond
expressing nothing, e.g. an activity and an entity have careful
definitions in terms of what is in the world, as do wasGeneratedBy and
used.

> I'm not seeing how dependedOn is really adding any expressive power.

I sympathise with that.

However, I still don't see why we would stop people asserting an
actual, affecting relation between entities if they see that to exist
in the world, just as they assert the existence of entities,
activities, generation events etc. based on what exists. In my mind,
derivation seems core to what is meant by provenance, e.g.
understanding why a bottle of wine is as it is, I would like to know
from what grapes it was made, in what land it was grown, what else was
added, etc.

Thanks,
Simon

>> On 19 November 2011 09:58, Graham Klyne<graham.klyne@zoo.ox.ac.uk>  wrote:
>>> On 17/11/2011 11:55, Luc Moreau wrote:
>>>> Hi Graham,
>>>>
>>>> The derivation section is indeed complex and needs simplification.
>>>>
>>>> I recently made this proposal
>>>> http://lists.w3.org/Archives/Public/public-prov-wg/2011Nov/0263.html
>>>>
>>>> It differs from yours as follows.
>>>>
>>>> Two derivations:
>>>> - wasDerivedFrom: activity linked
>>>> - wasEventuallyDerivedFrom (replaced by an adequate name)
>>>>
>>>> Simon has made the case that wasEventuallyDerivedFrom is not transitive. I think
>>>> it's reasonable.
>>>
>>> I assume you refer to
>>> http://lists.w3.org/Archives/Public/public-prov-wg/2011Nov/0196.html?
>>>
>>> I suppose it depends on what is the intended intent of "wasEventuallyDerivedFrom".
>>>
>>> I think a distinction here is between "necessarilyDerivedFrom" and
>>> "possiblyDerivedFrom" (modal logic, anyone? :) ).  (I'm introducing these terms
>>> for discussion only, I'm not proposing them for use.)
>>>
>>> For me, "e2 possiblyDerivedFrom e1" would be statement that e2 has e1 somewhere
>>> in its derivation history, is easy to understand, and I think this is something
>>> that we would reasonably expect provenance to express.  This would be transitive.
>>>
>>> OTOH, "e2 necessarilyDerivedFrom e1" is telling us that the value of e2 is in
>>> some sense materially affected by e1, which I think is taking us into the
>>> territory of what we actually mean by materially affected by - it's a variation
>>> of the problem that concerned me in the first place.  I see this is the case
>>> that Simon shows is not transitive.
>>>
>>>> So, what's the difference? wasDerivedFrom is associated with one and only one
>>>> activity.
>>>> wasEventuallyDerivedFrom is unspecific about activities behind this derivation
>>>> (but I believe there is some activity, we just don't know them, nor their number).
>>>> So, wasDerivedFrom would be a special case of wasEventuallyDerivedFrom.
>>>
>>> I'm fine with this, as far as it goes.
>>>
>>>> Several of us have indicated it is useful to have a transitive version. Stian
>>>> has a good
>>>> idea that the transitive version could also include control and wasComplementOf
>>>> (a bit
>>>> like participation was defined, but transitive).
>>>>
>>>> This is a much weaker relation, which states that one entity was in the
>>>> provenance of
>>>> another, essentially. It's not a derivation.
>>>> I would define this in the "Common Relation" section. Not sure how we name this,
>>>> though.
>>>
>>> I agree with the above - this corresponds to my "possiblyDerivedFrom".  See
>>> below for naming.
>>>
>>> Your above proposal seems to have "eventuallyDerivedFrom" meaning something more
>>> like my "necessarilyDerivedFrom".
>>>
>>> I would choose the weaker form and the direct form as the "built-in" relations
>>> as they are easier to define in a generic fashion; e.g. directlyDerivedFrom
>>> (strong form) and possiblyDerivedFrom (weak form).  Again I'm choosing these
>>> names to emphasize my discussion point, not proposing them here.
>>>
>>> The weaker form can be specialized by applications where there is a need for a
>>> stronger notion of derivation;  I don't think we're currently in a position to
>>> say what such a stronger form might be right now.  I don't think there is a
>>> single such form that is always applicable.
>>>
>>> ...
>>>
>>> Which brings us to naming.
>>>
>>> I don't want to get too hung up on this, but if the logic above is accepted, I
>>> think it becomes natural to use "derivedFrom" to cover the general (weakest)
>>> case, since that would be the generalization of all other forms of derivation.
>>> For example, it is quite intuitive that "directlyDerivedFrom" (currently just
>>> "derivedFrom") is a specialization of "derivedFrom", and it suggests a naming
>>> pattern that might be useful for other specializations.
>>>
>>> #g
>>> --
>>>
>>>
>>>>
>>>> On 11/17/2011 11:31 AM, Graham Klyne wrote:
>>>>> I'm reposting and slightly expanding a couple of PROV-DM issues that came up
>>>>> in my review of the primer under a separate subject line. They are related to
>>>>> derivation:
>>>>>
>>>>> http://dvcs.w3.org/hg/prov/raw-file/tip/model/ProvenanceModel.html#Derivation-Relation
>>>>>
>>>>>
>>>>> My understanding of what PROV-DM defines:
>>>>> (a) wasDerivedFrom - activity-linked direct derivation
>>>>> (b) eventuallyDerivedFrom - activity-independent derivation relation with
>>>>> explicit impact on result
>>>>> (c) dependedOn - activity-independent derivation relation possibly without
>>>>> impact on result
>>>>>
>>>>>
>>>>> == Two or three kinds of derivation? ==
>>>>>
>>>>> "PROV-DM offers two different forms of derivation records."
>>>>>
>>>>> "The three kinds of derivation records are successively introduced."
>>>>>
>>>>>
>>>>> == eventuallyDerivedFrom vs dependedOn ==
>>>>>
>>>>> I have never been particularly comfortable with this attempt to capture the
>>>>> distinction between something that was merely involved and something that
>>>>> actively informed the resulting entity. Philosophically, I think it's a very
>>>>> tricky distinction to draw. Also, it draws us into discussion of what might
>>>>> have been, which is something I understand that provenance is not intended to
>>>>> capture.
>>>>>
>>>>> In the primer example given about "DRAFT FOR REVIEW", maybe its presence does
>>>>> have an effect on the eventual document; if it were not present, the document
>>>>> might have been published without further revision. Who knows? I think there
>>>>> may be cases where the form of contribution is clearer and testable (e.g.
>>>>> becamePartOf), but to simply distinguish between contributory and
>>>>> non-contributory derivation is, I think, rather hard to do.
>>>>>
>>>>> My suggestion would be to drop the distinction, but to allow applications to
>>>>> specialize the property in ways that make sense for the application.
>>>>>
>>>>>
>>>>> == Direct derivation with unspecified action ==
>>>>>
>>>>> Is it possible to state that there is a direct derivation relation between two
>>>>> entities by some unspecified (existentially quantified) process execution?
>>>>>
>>>>> I think this is possible using expressions like "wasDerivedFrom(e2,e1)". It is
>>>>> stated, but I found it took some digging out of the text.
>>>>>
>>>>> ...
>>>>>
>>>>> My preference would be to have just two derivation properties:
>>>>>
>>>>> (1) wasDerivedFrom - transitive, activity-independent, account-independent.
>>>>> This would effectively be a superproperty of all derivation relations.
>>>>> (2) wasDirectlyDerivedFrom - non-transitive, activity-dependent (though the
>>>>> activity may be existentially inferred if not specified), and account-dependent.
>>>>>
>>>>> Other application-specific subproperties of wasDerivedFrom could be introduced
>>>>> as needed to capture more directly traceable notions of (esp. multi-step)
>>>>> derivation.
>>>>>
>>>>> (I think this is closer to the original OPM model, which made more sense to me).
>>>>>
>>>>> #g
>>>>> --
>>>>>
>>>>
>>>
>>>
>>
>>
>>
>



-- 
Dr Simon Miles
Lecturer, Department of Informatics
Kings College London, WC2R 2LS, UK
+44 (0)20 7848 1166

Provenance in Agent-mediated Healthcare Systems:
http://eprints.dcs.kcl.ac.uk/1273/
Received on Friday, 25 November 2011 11:01:57 UTC