Re: PROV-DM derivation concerns arising from my primer review

Hi Simon,

On 23/11/2011 16:33, Simon Miles wrote:
> I think your argument seems to push us to a position where we cannot
> assert anything about derivation at all.

Yes and no...

> If you argue that a "necessarilyDerivedFrom" is not possible to
> assert, I can't see why directlyDerivedFrom (your 'strong form') would
> be possible either. Either we can know it is "materially affected by"
> or we can't. If we can't then the strong form is impossible to assert,
> as all we can say is that two entities were somehow involved in a
> single activity.

directlyDerivedFrom is assertable based on the observation of a process that 
consumes one and generates the other.  (We don't know if the one materially 
affects the value of the other, but that's not what I thought direct derivation 
would be saying.)

> The weak form is not really about derivation at all, it's just a
> transitive closure on a directed graph that happens to contain only
> links pointing from future to past. So I find it unconformable to
> specialise it for expressing derivation.   If the relation I wanted to
> express was "containsQuotationFrom", which seems to be a derivation
> relation to me, then this is not transitive.

I agree, but I'm not seeing the problem here.

> Aren't we left being able to express nothing, just do a transitive closure?

I think it's the nature of the beast that there is little that can be inferred 
from a very generic framework.

So while I agree with your comment about the formal nature of a weak derivation 
relationship, I see its value is in the intent that is signals (informally), and 
also that it's a base from which more meaningful derivations can be, er, derived 
- though specialization.  And each meaningful derivation property (e.g. 
quotation) may have different formal properties.

Another example: GPL licensed code.  If C includes portions of B and B includes 
portions of A and A was GPL licensed, the the GPL licence requirements apply to 
C, even if C does not contain any actual code from A (assuming B was released by 
its developer at any point).  This example is transitive, due to the particular 
nature of the GPL licence.

The point I'm trying to make is that I don't think there much we can do in the 
way of expressing interesting conclusions until we start to consider the 
specific domain-dependent nature of the derivation being considered.

My understanding is that PROV is intended to be a domain-neutral framework for 
assembling provenance traces, the interesting aspects of which may be quite 
domain-specific.  As such, I don't expect PROV alone to express very much, but 
to be used in conjunction with other more semantically rich concepts to capture 
complex and important relationships between things.

So, I don't see it as a fault if PROV alone expresses nothing.  What counts is 
the part it can play in expressing things that are important.

<aside>
A similar argument could be made for first order logic.

In isolation, it expresses nothing - just mathematical structures, albeit more 
complex than transitive closures, but fundamentally no more meaningful.

Its value comes from being a framework in which terms can be associated concepts 
(or things, or values) by extra-logical means, and to hence show how such things 
must be related if they conform to certain logically expressed patterns.
</aside>

I feel a notion like "dependedOn" has no formal properties, it merely appeals to 
an ill-described notion of value propagation (I think).  Lacking both formal 
properties *and* a clear explanation of what it is *intended* to mean, I don't 
see it's adding any value.  What's the interoperability story here: if one 
application says A dependedOn B, what can a second application do with this 
information?  Why not just introduce a domain-specific relationship for which a 
specific meaning can be given informally, and maybe for which formal 
consequences can be expressed.

Hence my argument to exclude dependedOn (as I understand it), even if the 
framework thereby expresses nothing, because it (a) helps to keep it simpler, 
and (b) maintains a separation between inferential machinery and informally 
described intended interpretations.

I'm not seeing how dependedOn is really adding any expressive power.

#g
--


> On 19 November 2011 09:58, Graham Klyne<graham.klyne@zoo.ox.ac.uk>  wrote:
>> On 17/11/2011 11:55, Luc Moreau wrote:
>>> Hi Graham,
>>>
>>> The derivation section is indeed complex and needs simplification.
>>>
>>> I recently made this proposal
>>> http://lists.w3.org/Archives/Public/public-prov-wg/2011Nov/0263.html
>>>
>>> It differs from yours as follows.
>>>
>>> Two derivations:
>>> - wasDerivedFrom: activity linked
>>> - wasEventuallyDerivedFrom (replaced by an adequate name)
>>>
>>> Simon has made the case that wasEventuallyDerivedFrom is not transitive. I think
>>> it's reasonable.
>>
>> I assume you refer to
>> http://lists.w3.org/Archives/Public/public-prov-wg/2011Nov/0196.html?
>>
>> I suppose it depends on what is the intended intent of "wasEventuallyDerivedFrom".
>>
>> I think a distinction here is between "necessarilyDerivedFrom" and
>> "possiblyDerivedFrom" (modal logic, anyone? :) ).  (I'm introducing these terms
>> for discussion only, I'm not proposing them for use.)
>>
>> For me, "e2 possiblyDerivedFrom e1" would be statement that e2 has e1 somewhere
>> in its derivation history, is easy to understand, and I think this is something
>> that we would reasonably expect provenance to express.  This would be transitive.
>>
>> OTOH, "e2 necessarilyDerivedFrom e1" is telling us that the value of e2 is in
>> some sense materially affected by e1, which I think is taking us into the
>> territory of what we actually mean by materially affected by - it's a variation
>> of the problem that concerned me in the first place.  I see this is the case
>> that Simon shows is not transitive.
>>
>>> So, what's the difference? wasDerivedFrom is associated with one and only one
>>> activity.
>>> wasEventuallyDerivedFrom is unspecific about activities behind this derivation
>>> (but I believe there is some activity, we just don't know them, nor their number).
>>> So, wasDerivedFrom would be a special case of wasEventuallyDerivedFrom.
>>
>> I'm fine with this, as far as it goes.
>>
>>> Several of us have indicated it is useful to have a transitive version. Stian
>>> has a good
>>> idea that the transitive version could also include control and wasComplementOf
>>> (a bit
>>> like participation was defined, but transitive).
>>>
>>> This is a much weaker relation, which states that one entity was in the
>>> provenance of
>>> another, essentially. It's not a derivation.
>>> I would define this in the "Common Relation" section. Not sure how we name this,
>>> though.
>>
>> I agree with the above - this corresponds to my "possiblyDerivedFrom".  See
>> below for naming.
>>
>> Your above proposal seems to have "eventuallyDerivedFrom" meaning something more
>> like my "necessarilyDerivedFrom".
>>
>> I would choose the weaker form and the direct form as the "built-in" relations
>> as they are easier to define in a generic fashion; e.g. directlyDerivedFrom
>> (strong form) and possiblyDerivedFrom (weak form).  Again I'm choosing these
>> names to emphasize my discussion point, not proposing them here.
>>
>> The weaker form can be specialized by applications where there is a need for a
>> stronger notion of derivation;  I don't think we're currently in a position to
>> say what such a stronger form might be right now.  I don't think there is a
>> single such form that is always applicable.
>>
>> ...
>>
>> Which brings us to naming.
>>
>> I don't want to get too hung up on this, but if the logic above is accepted, I
>> think it becomes natural to use "derivedFrom" to cover the general (weakest)
>> case, since that would be the generalization of all other forms of derivation.
>> For example, it is quite intuitive that "directlyDerivedFrom" (currently just
>> "derivedFrom") is a specialization of "derivedFrom", and it suggests a naming
>> pattern that might be useful for other specializations.
>>
>> #g
>> --
>>
>>
>>>
>>> On 11/17/2011 11:31 AM, Graham Klyne wrote:
>>>> I'm reposting and slightly expanding a couple of PROV-DM issues that came up
>>>> in my review of the primer under a separate subject line. They are related to
>>>> derivation:
>>>>
>>>> http://dvcs.w3.org/hg/prov/raw-file/tip/model/ProvenanceModel.html#Derivation-Relation
>>>>
>>>>
>>>> My understanding of what PROV-DM defines:
>>>> (a) wasDerivedFrom - activity-linked direct derivation
>>>> (b) eventuallyDerivedFrom - activity-independent derivation relation with
>>>> explicit impact on result
>>>> (c) dependedOn - activity-independent derivation relation possibly without
>>>> impact on result
>>>>
>>>>
>>>> == Two or three kinds of derivation? ==
>>>>
>>>> "PROV-DM offers two different forms of derivation records."
>>>>
>>>> "The three kinds of derivation records are successively introduced."
>>>>
>>>>
>>>> == eventuallyDerivedFrom vs dependedOn ==
>>>>
>>>> I have never been particularly comfortable with this attempt to capture the
>>>> distinction between something that was merely involved and something that
>>>> actively informed the resulting entity. Philosophically, I think it's a very
>>>> tricky distinction to draw. Also, it draws us into discussion of what might
>>>> have been, which is something I understand that provenance is not intended to
>>>> capture.
>>>>
>>>> In the primer example given about "DRAFT FOR REVIEW", maybe its presence does
>>>> have an effect on the eventual document; if it were not present, the document
>>>> might have been published without further revision. Who knows? I think there
>>>> may be cases where the form of contribution is clearer and testable (e.g.
>>>> becamePartOf), but to simply distinguish between contributory and
>>>> non-contributory derivation is, I think, rather hard to do.
>>>>
>>>> My suggestion would be to drop the distinction, but to allow applications to
>>>> specialize the property in ways that make sense for the application.
>>>>
>>>>
>>>> == Direct derivation with unspecified action ==
>>>>
>>>> Is it possible to state that there is a direct derivation relation between two
>>>> entities by some unspecified (existentially quantified) process execution?
>>>>
>>>> I think this is possible using expressions like "wasDerivedFrom(e2,e1)". It is
>>>> stated, but I found it took some digging out of the text.
>>>>
>>>> ...
>>>>
>>>> My preference would be to have just two derivation properties:
>>>>
>>>> (1) wasDerivedFrom - transitive, activity-independent, account-independent.
>>>> This would effectively be a superproperty of all derivation relations.
>>>> (2) wasDirectlyDerivedFrom - non-transitive, activity-dependent (though the
>>>> activity may be existentially inferred if not specified), and account-dependent.
>>>>
>>>> Other application-specific subproperties of wasDerivedFrom could be introduced
>>>> as needed to capture more directly traceable notions of (esp. multi-step)
>>>> derivation.
>>>>
>>>> (I think this is closer to the original OPM model, which made more sense to me).
>>>>
>>>> #g
>>>> --
>>>>
>>>
>>
>>
>
>
>

Received on Friday, 25 November 2011 10:11:33 UTC