PROV-DM derivation concerns arising from my primer review

I'm reposting and slightly expanding a couple of PROV-DM issues that came up in 
my review of the primer under a separate subject line.  They are related to 
derivation:

http://dvcs.w3.org/hg/prov/raw-file/tip/model/ProvenanceModel.html#Derivation-Relation

My understanding of what PROV-DM defines:
(a) wasDerivedFrom - activity-linked direct derivation
(b) eventuallyDerivedFrom - activity-independent derivation relation with 
explicit impact on result
(c) dependedOn - activity-independent derivation relation possibly without 
impact on result


== Two or three kinds of derivation? ==

"PROV-DM offers two different forms of derivation records."

"The three kinds of derivation records are successively introduced."


== eventuallyDerivedFrom vs dependedOn ==

I have never been particularly comfortable with this attempt to capture the 
distinction between something that was merely involved and something that 
actively informed the resulting entity.  Philosophically, I think it's a very 
tricky distinction to draw.  Also, it draws us into discussion of what might 
have been, which is something I understand that provenance is not intended to 
capture.

In the primer example given about "DRAFT FOR REVIEW", maybe its presence does 
have an effect on the eventual document; if it were not present, the document 
might have been published without further revision.  Who knows?  I think there 
may be cases where the form of contribution is clearer and testable (e.g. 
becamePartOf),  but to simply distinguish between contributory and 
non-contributory derivation is, I think, rather hard to do.

My suggestion would be to drop the distinction, but to allow applications to 
specialize the property in ways that make sense for the application.


== Direct derivation with unspecified action ==

Is it possible to state that there is a direct derivation relation between two 
entities by some unspecified (existentially quantified) process execution?

I think this is possible using expressions like "wasDerivedFrom(e2,e1)".  It is 
stated, but I found it took some digging out of the text.

...

My preference would be to have just two derivation properties:

(1) wasDerivedFrom - transitive, activity-independent, account-independent. 
This would effectively be a superproperty of all derivation relations.
(2) wasDirectlyDerivedFrom - non-transitive, activity-dependent (though the 
activity may be existentially inferred if not specified), and account-dependent.

Other application-specific subproperties of wasDerivedFrom could be introduced 
as needed to capture more directly traceable notions of (esp. multi-step) 
derivation.

(I think this is closer to the original OPM model, which made more sense to me).

#g
--

Received on Thursday, 17 November 2011 11:31:49 UTC