Re: PROV-DM derivation concerns arising from my primer review

On Sat, Nov 19, 2011 at 08:17, Graham Klyne <graham.klyne@zoo.ox.ac.uk> wrote:

> I think a distinction here is between "necessarilyDerivedFrom" and
> "possiblyDerivedFrom" (modal logic, anyone? :) ).  (I'm introducing these
> terms for discussion only, I'm not proposing them for use.)
>
> For me, "e2 possiblyDerivedFrom e1" would be statement that e2 has e1
> somewhere in its derivation history, is easy to understand, and I think this
> is something that we would reasonably expect provenance to express.  This
> would be transitive.

Yes, this is the "e1 is in the provenance past of e2" statement. The
path to that past might or might not also be stated - but it can be
inferred to exist, with undefined number of activities and possibly
complementaries in-between.


> OTOH, "e2 necessarilyDerivedFrom e1" is telling us that the value of e2 is
> in some sense materially affected by e1, which I think is taking us into the
> territory of what we actually mean by materially affected by - it's a
> variation of the problem that concerned me in the first place.  I see this
> is the case that Simon shows is not transitive.

Yes, unlike the possiblyDerivedFrom, such an assertion provides new
information which could not be inferred by the provenance path between
e2 and e1.

necessarilyDerivedFrom is a strong statement that e2 *was* affected by
e1. This implies also that e2 must have been possiblyDerivedFrom e1 as
well, because if e2 was affected by e1, then e1 must appear in its
provenance past (stated or not).

The nature of what 'affected 'means is not up to us to define, that is
up to the asserter.

One asserter might think that "DRAFT FOR REVIEW" did actually affect
the final product (he has identified a pixel that has survived from
that draft) - he can state this with "necessarilyDerivedFrom".

Note that this does not imply that there was a single activity that
used e1 and generated e2 (I think if you know this, then simply state
that activity!) - just that possiblyDerivedFrom(e2,e1) (there was a
chain of use/generation/control/dependedOn from e2 leading back to e1)
and the semantic meaning that "e2 was influenced by e1". The nature of
that influence can be specified by subproperties/qualifiers.


Another asserter don't know or is not able to tell if "DRAFT FOR
REVIEW" affected the final product, but he knows it was there
somewhere in the past, and can state "possiblyDerivedFrom". He does
this because
a) He does not know all the activities in between,
-or-
b) He Works on the level of entities rather than activities (data
lineage perspective)
-or-
c) Wants to be 'complete' and have inferred this from stated activity
interactions.
-or-
d) Something I didn't think of - perhaps he made a subproperty that is
stronger than possiblyDerivedFrom but not as strong as
necessarilyDerivedFrom


> I don't want to get too hung up on this, but if the logic above is accepted,
> I think it becomes natural to use "derivedFrom" to cover the general
> (weakest) case, since that would be the generalization of all other forms of
> derivation. For example, it is quite intuitive that "directlyDerivedFrom"
> (currently just "derivedFrom") is a specialization of "derivedFrom", and it
> suggests a naming pattern that might be useful for other specializations.

But "Derived from" implies that it *is* affected (by derivation). If I
saw "derivedFrom" and "directlyDerivedFrom" I would interpret these as
both being affected - where directlyDerivedFrom is the non-transitive
one. I have still no way to express that something was "in the
provenance past of" another entity.



-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester

Received on Thursday, 24 November 2011 09:50:29 UTC