Re: PROV-DM derivation concerns arising from my primer review from Graham Klyne on 2011-11-19 (public-prov-wg@w3.org from November 2011)

From: Graham Klyne <graham.klyne@zoo.ox.ac.uk>
Date: Sat, 19 Nov 2011 08:17:54 +0000
To: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
CC: public-prov-wg@w3.org
Message-ID: <4EC76632.7050409@zoo.ox.ac.uk>
On 17/11/2011 11:55, Luc Moreau wrote:
> Hi Graham,
>
> The derivation section is indeed complex and needs simplification.
>
> I recently made this proposal
> http://lists.w3.org/Archives/Public/public-prov-wg/2011Nov/0263.html
>
> It differs from yours as follows.
>
> Two derivations:
> - wasDerivedFrom: activity linked
> - wasEventuallyDerivedFrom (replaced by an adequate name)
>
> Simon has made the case that wasEventuallyDerivedFrom is not transitive. I think
> it's reasonable.

I assume you refer to 
http://lists.w3.org/Archives/Public/public-prov-wg/2011Nov/0196.html?

I suppose it depends on what is the intended intent of "wasEventuallyDerivedFrom".

I think a distinction here is between "necessarilyDerivedFrom" and 
"possiblyDerivedFrom" (modal logic, anyone? :) ).  (I'm introducing these terms 
for discussion only, I'm not proposing them for use.)

For me, "e2 possiblyDerivedFrom e1" would be statement that e2 has e1 somewhere 
in its derivation history, is easy to understand, and I think this is something 
that we would reasonably expect provenance to express.  This would be transitive.

OTOH, "e2 necessarilyDerivedFrom e1" is telling us that the value of e2 is in 
some sense materially affected by e1, which I think is taking us into the 
territory of what we actually mean by materially affected by - it's a variation 
of the problem that concerned me in the first place.  I see this is the case 
that Simon shows is not transitive.

> So, what's the difference? wasDerivedFrom is associated with one and only one
> activity.
> wasEventuallyDerivedFrom is unspecific about activities behind this derivation
> (but I believe there is some activity, we just don't know them, nor their number).
> So, wasDerivedFrom would be a special case of wasEventuallyDerivedFrom.

I'm fine with this, as far as it goes.

> Several of us have indicated it is useful to have a transitive version. Stian
> has a good
> idea that the transitive version could also include control and wasComplementOf
> (a bit
> like participation was defined, but transitive).
>
> This is a much weaker relation, which states that one entity was in the
> provenance of
> another, essentially. It's not a derivation.
> I would define this in the "Common Relation" section. Not sure how we name this,
> though.

I agree with the above - this corresponds to my "possiblyDerivedFrom".  See 
below for naming.

Your above proposal seems to have "eventuallyDerivedFrom" meaning something more 
like my "necessarilyDerivedFrom".

I would choose the weaker form and the direct form as the "built-in" relations 
as they are easier to define in a generic fashion; e.g. directlyDerivedFrom 
(strong form) and possiblyDerivedFrom (weak form).  Again I'm choosing these 
names to emphasize my discussion point, not proposing them here.

The weaker form can be specialized by applications where there is a need for a 
stronger notion of derivation;  I don't think we're currently in a position to 
say what such a stronger form might be right now.  I don't think there is a 
single such form that is always applicable.

...

Which brings us to naming.

I don't want to get too hung up on this, but if the logic above is accepted, I 
think it becomes natural to use "derivedFrom" to cover the general (weakest) 
case, since that would be the generalization of all other forms of derivation. 
For example, it is quite intuitive that "directlyDerivedFrom" (currently just 
"derivedFrom") is a specialization of "derivedFrom", and it suggests a naming 
pattern that might be useful for other specializations.

#g
--


>
> On 11/17/2011 11:31 AM, Graham Klyne wrote:
>> I'm reposting and slightly expanding a couple of PROV-DM issues that came up
>> in my review of the primer under a separate subject line. They are related to
>> derivation:
>>
>> http://dvcs.w3.org/hg/prov/raw-file/tip/model/ProvenanceModel.html#Derivation-Relation
>>
>>
>> My understanding of what PROV-DM defines:
>> (a) wasDerivedFrom - activity-linked direct derivation
>> (b) eventuallyDerivedFrom - activity-independent derivation relation with
>> explicit impact on result
>> (c) dependedOn - activity-independent derivation relation possibly without
>> impact on result
>>
>>
>> == Two or three kinds of derivation? ==
>>
>> "PROV-DM offers two different forms of derivation records."
>>
>> "The three kinds of derivation records are successively introduced."
>>
>>
>> == eventuallyDerivedFrom vs dependedOn ==
>>
>> I have never been particularly comfortable with this attempt to capture the
>> distinction between something that was merely involved and something that
>> actively informed the resulting entity. Philosophically, I think it's a very
>> tricky distinction to draw. Also, it draws us into discussion of what might
>> have been, which is something I understand that provenance is not intended to
>> capture.
>>
>> In the primer example given about "DRAFT FOR REVIEW", maybe its presence does
>> have an effect on the eventual document; if it were not present, the document
>> might have been published without further revision. Who knows? I think there
>> may be cases where the form of contribution is clearer and testable (e.g.
>> becamePartOf), but to simply distinguish between contributory and
>> non-contributory derivation is, I think, rather hard to do.
>>
>> My suggestion would be to drop the distinction, but to allow applications to
>> specialize the property in ways that make sense for the application.
>>
>>
>> == Direct derivation with unspecified action ==
>>
>> Is it possible to state that there is a direct derivation relation between two
>> entities by some unspecified (existentially quantified) process execution?
>>
>> I think this is possible using expressions like "wasDerivedFrom(e2,e1)". It is
>> stated, but I found it took some digging out of the text.
>>
>> ...
>>
>> My preference would be to have just two derivation properties:
>>
>> (1) wasDerivedFrom - transitive, activity-independent, account-independent.
>> This would effectively be a superproperty of all derivation relations.
>> (2) wasDirectlyDerivedFrom - non-transitive, activity-dependent (though the
>> activity may be existentially inferred if not specified), and account-dependent.
>>
>> Other application-specific subproperties of wasDerivedFrom could be introduced
>> as needed to capture more directly traceable notions of (esp. multi-step)
>> derivation.
>>
>> (I think this is closer to the original OPM model, which made more sense to me).
>>
>> #g
>> --
>>
>
Received on Saturday, 19 November 2011 09:59:40 UTC