Re: PROV-DM derivation concerns arising from my primer review

Simon,

This is ultimately about modelling choices, and I think there are reasonably 
differing positions one might take.  Having said my piece, which I think you 
have understood, I don't feel so deeply about this to continue to argue the case.

But there's one thing you say that I'd like to respond to, just in case it 
underlies a misundersatnding:

 > However, I still don't see why we would stop people asserting an
 > actual, affecting relation between entities if they see that to exist
 > in the world, just as they assert the existence of entities,
 > activities, generation events etc. based on what exists. In my mind,
 > derivation seems core to what is meant by provenance, e.g.
 > understanding why a bottle of wine is as it is, I would like to know
 > from what grapes it was made, in what land it was grown, what else was
 > added, etc.

My position is not about trying to "stop people asserting [a] ... relation".

I'd like users to be able to express whatever forms of derivation they find 
useful to express.  The value of the weak derivation relation I propose is that 
it acts as a superproperty, a kind of grouping for any or all of these, so that 
a provenance processor can potentially know that an unknown relation between 
entities is a kind of derivation, even if it does not know anything more about 
the nature of such derivation.

#g
--

On 25/11/2011 11:01, Simon Miles wrote:
> Hi Graham,
>
>> directlyDerivedFrom is assertable based on the observation of a process that
>> consumes one and generates the other.  (We don't know if the one materially
>> affects the value of the other, but that's not what I thought direct derivation
>> would be saying.)
>
> I'm pretty sure the intent was to say more than this. Why would we
> want to assert this? As I believe Luc has argued in the past, if an
> activity generates B and later uses A, it is not informative to say
> that B derives from A, but this is just an extreme example where lack
> of affect is most evident.
>
>> I think it's the nature of the beast that there is little that can be inferred
>> from a very generic framework.
>
> I completely agree, but I'm not arguing for inference, just assertion
> of derivation.
>
>> So while I agree with your comment about the formal nature of a weak derivation
>> relationship, I see its value is in the intent that is signals (informally), and
>> also that it's a base from which more meaningful derivations can be, er, derived
>> - though specialization.  And each meaningful derivation property (e.g.
>> quotation) may have different formal properties.
>
> I understand, but I'm not sure why we would have any general
> derivation relation at all in that case. It seems to say so little
> (unlike, say, wasGeneratedBy) that saying containsQuoteFrom
> specialises wasDerivedFrom seems to say nothing at all. The weak
> definition means the two entities related by wasDerivedFrom may have
> no more connection than two arbitrary entities, surely?
>
>> <aside>
>> A similar argument could be made for first order logic.
>>
>> In isolation, it expresses nothing - just mathematical structures, albeit more
>> complex than transitive closures, but fundamentally no more meaningful.
>>
>> Its value comes from being a framework in which terms can be associated concepts
>> (or things, or values) by extra-logical means, and to hence show how such things
>> must be related if they conform to certain logically expressed patterns.
>> </aside>
>
> Well, yes, but those patterns place constraints, which wasDerivedFrom
> does not seem to. Moreover, PROV-DM in all other cases goes beyond
> expressing nothing, e.g. an activity and an entity have careful
> definitions in terms of what is in the world, as do wasGeneratedBy and
> used.
>
>> I'm not seeing how dependedOn is really adding any expressive power.
>
> I sympathise with that.
>
> However, I still don't see why we would stop people asserting an
> actual, affecting relation between entities if they see that to exist
> in the world, just as they assert the existence of entities,
> activities, generation events etc. based on what exists. In my mind,
> derivation seems core to what is meant by provenance, e.g.
> understanding why a bottle of wine is as it is, I would like to know
> from what grapes it was made, in what land it was grown, what else was
> added, etc.
>
> Thanks,
> Simon
>
>>> On 19 November 2011 09:58, Graham Klyne<graham.klyne@zoo.ox.ac.uk>    wrote:
>>>> On 17/11/2011 11:55, Luc Moreau wrote:
>>>>> Hi Graham,
>>>>>
>>>>> The derivation section is indeed complex and needs simplification.
>>>>>
>>>>> I recently made this proposal
>>>>> http://lists.w3.org/Archives/Public/public-prov-wg/2011Nov/0263.html
>>>>>
>>>>> It differs from yours as follows.
>>>>>
>>>>> Two derivations:
>>>>> - wasDerivedFrom: activity linked
>>>>> - wasEventuallyDerivedFrom (replaced by an adequate name)
>>>>>
>>>>> Simon has made the case that wasEventuallyDerivedFrom is not transitive. I think
>>>>> it's reasonable.
>>>>
>>>> I assume you refer to
>>>> http://lists.w3.org/Archives/Public/public-prov-wg/2011Nov/0196.html?
>>>>
>>>> I suppose it depends on what is the intended intent of "wasEventuallyDerivedFrom".
>>>>
>>>> I think a distinction here is between "necessarilyDerivedFrom" and
>>>> "possiblyDerivedFrom" (modal logic, anyone? :) ).  (I'm introducing these terms
>>>> for discussion only, I'm not proposing them for use.)
>>>>
>>>> For me, "e2 possiblyDerivedFrom e1" would be statement that e2 has e1 somewhere
>>>> in its derivation history, is easy to understand, and I think this is something
>>>> that we would reasonably expect provenance to express.  This would be transitive.
>>>>
>>>> OTOH, "e2 necessarilyDerivedFrom e1" is telling us that the value of e2 is in
>>>> some sense materially affected by e1, which I think is taking us into the
>>>> territory of what we actually mean by materially affected by - it's a variation
>>>> of the problem that concerned me in the first place.  I see this is the case
>>>> that Simon shows is not transitive.
>>>>
>>>>> So, what's the difference? wasDerivedFrom is associated with one and only one
>>>>> activity.
>>>>> wasEventuallyDerivedFrom is unspecific about activities behind this derivation
>>>>> (but I believe there is some activity, we just don't know them, nor their number).
>>>>> So, wasDerivedFrom would be a special case of wasEventuallyDerivedFrom.
>>>>
>>>> I'm fine with this, as far as it goes.
>>>>
>>>>> Several of us have indicated it is useful to have a transitive version. Stian
>>>>> has a good
>>>>> idea that the transitive version could also include control and wasComplementOf
>>>>> (a bit
>>>>> like participation was defined, but transitive).
>>>>>
>>>>> This is a much weaker relation, which states that one entity was in the
>>>>> provenance of
>>>>> another, essentially. It's not a derivation.
>>>>> I would define this in the "Common Relation" section. Not sure how we name this,
>>>>> though.
>>>>
>>>> I agree with the above - this corresponds to my "possiblyDerivedFrom".  See
>>>> below for naming.
>>>>
>>>> Your above proposal seems to have "eventuallyDerivedFrom" meaning something more
>>>> like my "necessarilyDerivedFrom".
>>>>
>>>> I would choose the weaker form and the direct form as the "built-in" relations
>>>> as they are easier to define in a generic fashion; e.g. directlyDerivedFrom
>>>> (strong form) and possiblyDerivedFrom (weak form).  Again I'm choosing these
>>>> names to emphasize my discussion point, not proposing them here.
>>>>
>>>> The weaker form can be specialized by applications where there is a need for a
>>>> stronger notion of derivation;  I don't think we're currently in a position to
>>>> say what such a stronger form might be right now.  I don't think there is a
>>>> single such form that is always applicable.
>>>>
>>>> ...
>>>>
>>>> Which brings us to naming.
>>>>
>>>> I don't want to get too hung up on this, but if the logic above is accepted, I
>>>> think it becomes natural to use "derivedFrom" to cover the general (weakest)
>>>> case, since that would be the generalization of all other forms of derivation.
>>>> For example, it is quite intuitive that "directlyDerivedFrom" (currently just
>>>> "derivedFrom") is a specialization of "derivedFrom", and it suggests a naming
>>>> pattern that might be useful for other specializations.
>>>>
>>>> #g
>>>> --
>>>>
>>>>
>>>>>
>>>>> On 11/17/2011 11:31 AM, Graham Klyne wrote:
>>>>>> I'm reposting and slightly expanding a couple of PROV-DM issues that came up
>>>>>> in my review of the primer under a separate subject line. They are related to
>>>>>> derivation:
>>>>>>
>>>>>> http://dvcs.w3.org/hg/prov/raw-file/tip/model/ProvenanceModel.html#Derivation-Relation
>>>>>>
>>>>>>
>>>>>> My understanding of what PROV-DM defines:
>>>>>> (a) wasDerivedFrom - activity-linked direct derivation
>>>>>> (b) eventuallyDerivedFrom - activity-independent derivation relation with
>>>>>> explicit impact on result
>>>>>> (c) dependedOn - activity-independent derivation relation possibly without
>>>>>> impact on result
>>>>>>
>>>>>>
>>>>>> == Two or three kinds of derivation? ==
>>>>>>
>>>>>> "PROV-DM offers two different forms of derivation records."
>>>>>>
>>>>>> "The three kinds of derivation records are successively introduced."
>>>>>>
>>>>>>
>>>>>> == eventuallyDerivedFrom vs dependedOn ==
>>>>>>
>>>>>> I have never been particularly comfortable with this attempt to capture the
>>>>>> distinction between something that was merely involved and something that
>>>>>> actively informed the resulting entity. Philosophically, I think it's a very
>>>>>> tricky distinction to draw. Also, it draws us into discussion of what might
>>>>>> have been, which is something I understand that provenance is not intended to
>>>>>> capture.
>>>>>>
>>>>>> In the primer example given about "DRAFT FOR REVIEW", maybe its presence does
>>>>>> have an effect on the eventual document; if it were not present, the document
>>>>>> might have been published without further revision. Who knows? I think there
>>>>>> may be cases where the form of contribution is clearer and testable (e.g.
>>>>>> becamePartOf), but to simply distinguish between contributory and
>>>>>> non-contributory derivation is, I think, rather hard to do.
>>>>>>
>>>>>> My suggestion would be to drop the distinction, but to allow applications to
>>>>>> specialize the property in ways that make sense for the application.
>>>>>>
>>>>>>
>>>>>> == Direct derivation with unspecified action ==
>>>>>>
>>>>>> Is it possible to state that there is a direct derivation relation between two
>>>>>> entities by some unspecified (existentially quantified) process execution?
>>>>>>
>>>>>> I think this is possible using expressions like "wasDerivedFrom(e2,e1)". It is
>>>>>> stated, but I found it took some digging out of the text.
>>>>>>
>>>>>> ...
>>>>>>
>>>>>> My preference would be to have just two derivation properties:
>>>>>>
>>>>>> (1) wasDerivedFrom - transitive, activity-independent, account-independent.
>>>>>> This would effectively be a superproperty of all derivation relations.
>>>>>> (2) wasDirectlyDerivedFrom - non-transitive, activity-dependent (though the
>>>>>> activity may be existentially inferred if not specified), and account-dependent.
>>>>>>
>>>>>> Other application-specific subproperties of wasDerivedFrom could be introduced
>>>>>> as needed to capture more directly traceable notions of (esp. multi-step)
>>>>>> derivation.
>>>>>>
>>>>>> (I think this is closer to the original OPM model, which made more sense to me).
>>>>>>
>>>>>> #g
>>>>>> --
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>
>
>

Received on Friday, 25 November 2011 12:37:32 UTC