Re: PROV-DM derivation concerns arising from my primer review from Simon Miles on 2011-11-23 (public-prov-wg@w3.org from November 2011)

From: Simon Miles <simon.miles@kcl.ac.uk>
Date: Wed, 23 Nov 2011 16:33:59 +0000
To: Provenance Working Group WG <public-prov-wg@w3.org>
Message-ID: <CAKc1nHeGu9dmQMP-aDxgWGqTyUAAK8JEXE7GrB81XZGV=gujbQ@mail.gmail.com>
Hello Graham,

I think your argument seems to push us to a position where we cannot
assert anything about derivation at all.

If you argue that a "necessarilyDerivedFrom" is not possible to
assert, I can't see why directlyDerivedFrom (your 'strong form') would
be possible either. Either we can know it is "materially affected by"
or we can't. If we can't then the strong form is impossible to assert,
as all we can say is that two entities were somehow involved in a
single activity.

The weak form is not really about derivation at all, it's just a
transitive closure on a directed graph that happens to contain only
links pointing from future to past. So I find it unconformable to
specialise it for expressing derivation. If the relation I wanted to
express was "containsQuotationFrom", which seems to be a derivation
relation to me, then this is not transitive.

Aren't we left being able to express nothing, just do a transitive closure?

Thanks,
Simon

On 19 November 2011 09:58, Graham Klyne <graham.klyne@zoo.ox.ac.uk> wrote:
> On 17/11/2011 11:55, Luc Moreau wrote:
>> Hi Graham,
>>
>> The derivation section is indeed complex and needs simplification.
>>
>> I recently made this proposal
>> http://lists.w3.org/Archives/Public/public-prov-wg/2011Nov/0263.html
>>
>> It differs from yours as follows.
>>
>> Two derivations:
>> - wasDerivedFrom: activity linked
>> - wasEventuallyDerivedFrom (replaced by an adequate name)
>>
>> Simon has made the case that wasEventuallyDerivedFrom is not transitive. I think
>> it's reasonable.
>
> I assume you refer to
> http://lists.w3.org/Archives/Public/public-prov-wg/2011Nov/0196.html?
>
> I suppose it depends on what is the intended intent of "wasEventuallyDerivedFrom".
>
> I think a distinction here is between "necessarilyDerivedFrom" and
> "possiblyDerivedFrom" (modal logic, anyone? :) ).  (I'm introducing these terms
> for discussion only, I'm not proposing them for use.)
>
> For me, "e2 possiblyDerivedFrom e1" would be statement that e2 has e1 somewhere
> in its derivation history, is easy to understand, and I think this is something
> that we would reasonably expect provenance to express.  This would be transitive.
>
> OTOH, "e2 necessarilyDerivedFrom e1" is telling us that the value of e2 is in
> some sense materially affected by e1, which I think is taking us into the
> territory of what we actually mean by materially affected by - it's a variation
> of the problem that concerned me in the first place.  I see this is the case
> that Simon shows is not transitive.
>
>> So, what's the difference? wasDerivedFrom is associated with one and only one
>> activity.
>> wasEventuallyDerivedFrom is unspecific about activities behind this derivation
>> (but I believe there is some activity, we just don't know them, nor their number).
>> So, wasDerivedFrom would be a special case of wasEventuallyDerivedFrom.
>
> I'm fine with this, as far as it goes.
>
>> Several of us have indicated it is useful to have a transitive version. Stian
>> has a good
>> idea that the transitive version could also include control and wasComplementOf
>> (a bit
>> like participation was defined, but transitive).
>>
>> This is a much weaker relation, which states that one entity was in the
>> provenance of
>> another, essentially. It's not a derivation.
>> I would define this in the "Common Relation" section. Not sure how we name this,
>> though.
>
> I agree with the above - this corresponds to my "possiblyDerivedFrom".  See
> below for naming.
>
> Your above proposal seems to have "eventuallyDerivedFrom" meaning something more
> like my "necessarilyDerivedFrom".
>
> I would choose the weaker form and the direct form as the "built-in" relations
> as they are easier to define in a generic fashion; e.g. directlyDerivedFrom
> (strong form) and possiblyDerivedFrom (weak form).  Again I'm choosing these
> names to emphasize my discussion point, not proposing them here.
>
> The weaker form can be specialized by applications where there is a need for a
> stronger notion of derivation;  I don't think we're currently in a position to
> say what such a stronger form might be right now.  I don't think there is a
> single such form that is always applicable.
>
> ...
>
> Which brings us to naming.
>
> I don't want to get too hung up on this, but if the logic above is accepted, I
> think it becomes natural to use "derivedFrom" to cover the general (weakest)
> case, since that would be the generalization of all other forms of derivation.
> For example, it is quite intuitive that "directlyDerivedFrom" (currently just
> "derivedFrom") is a specialization of "derivedFrom", and it suggests a naming
> pattern that might be useful for other specializations.
>
> #g
> --
>
>
>>
>> On 11/17/2011 11:31 AM, Graham Klyne wrote:
>>> I'm reposting and slightly expanding a couple of PROV-DM issues that came up
>>> in my review of the primer under a separate subject line. They are related to
>>> derivation:
>>>
>>> http://dvcs.w3.org/hg/prov/raw-file/tip/model/ProvenanceModel.html#Derivation-Relation
>>>
>>>
>>> My understanding of what PROV-DM defines:
>>> (a) wasDerivedFrom - activity-linked direct derivation
>>> (b) eventuallyDerivedFrom - activity-independent derivation relation with
>>> explicit impact on result
>>> (c) dependedOn - activity-independent derivation relation possibly without
>>> impact on result
>>>
>>>
>>> == Two or three kinds of derivation? ==
>>>
>>> "PROV-DM offers two different forms of derivation records."
>>>
>>> "The three kinds of derivation records are successively introduced."
>>>
>>>
>>> == eventuallyDerivedFrom vs dependedOn ==
>>>
>>> I have never been particularly comfortable with this attempt to capture the
>>> distinction between something that was merely involved and something that
>>> actively informed the resulting entity. Philosophically, I think it's a very
>>> tricky distinction to draw. Also, it draws us into discussion of what might
>>> have been, which is something I understand that provenance is not intended to
>>> capture.
>>>
>>> In the primer example given about "DRAFT FOR REVIEW", maybe its presence does
>>> have an effect on the eventual document; if it were not present, the document
>>> might have been published without further revision. Who knows? I think there
>>> may be cases where the form of contribution is clearer and testable (e.g.
>>> becamePartOf), but to simply distinguish between contributory and
>>> non-contributory derivation is, I think, rather hard to do.
>>>
>>> My suggestion would be to drop the distinction, but to allow applications to
>>> specialize the property in ways that make sense for the application.
>>>
>>>
>>> == Direct derivation with unspecified action ==
>>>
>>> Is it possible to state that there is a direct derivation relation between two
>>> entities by some unspecified (existentially quantified) process execution?
>>>
>>> I think this is possible using expressions like "wasDerivedFrom(e2,e1)". It is
>>> stated, but I found it took some digging out of the text.
>>>
>>> ...
>>>
>>> My preference would be to have just two derivation properties:
>>>
>>> (1) wasDerivedFrom - transitive, activity-independent, account-independent.
>>> This would effectively be a superproperty of all derivation relations.
>>> (2) wasDirectlyDerivedFrom - non-transitive, activity-dependent (though the
>>> activity may be existentially inferred if not specified), and account-dependent.
>>>
>>> Other application-specific subproperties of wasDerivedFrom could be introduced
>>> as needed to capture more directly traceable notions of (esp. multi-step)
>>> derivation.
>>>
>>> (I think this is closer to the original OPM model, which made more sense to me).
>>>
>>> #g
>>> --
>>>
>>
>
>



-- 
Dr Simon Miles
Lecturer, Department of Informatics
Kings College London, WC2R 2LS, UK
+44 (0)20 7848 1166

Provenance in Agent-mediated Healthcare Systems:
http://eprints.dcs.kcl.ac.uk/1273/
Received on Wednesday, 23 November 2011 16:34:31 UTC