Re: PROV-ISSUE-67 (single-execution): Why is there a difference in what is represented by one vs multiple executions? [Conceptual Model]

Breaking my reply into two parts...

 

Regarding the default - I understand the concern about inconsistent
knowledge, but I think we still need a consistent way for PIL
interpreters to work on provenance graphs where there is no assertion
about inferability/transitivity - if we just have used-PE-generated
structures, should PIL engines infer isDerivedBy and assume transitivity
or not? If we do, the assumptions could be incorrect (too many things
tagged as derived), if we don't, the assumptions could also be incorrect
(too few tagged as derived) - it would be nice though if different
engines made worked the same way and would agree on the results.

 

Regarding subclassing isDerivedFrom to be transitive or not - the
problem is that whether or not derivation is transitive depends on the
nature of the Bobs, not on a quality of the derivedFrom relationship. I
could have a Bob B (Birthday Cake) derivedFrom some other one (A - an
egg) and then two forward derivations -  Bob C derivedFrom B where C is
crumbs and Bob D derivedFrom B where D is a used Candle. All of those
are valid, but I can't get the correct results (C derivedFrom A but D
notDerivedFrom A) by setting flags on the three original derivations. I
have to label Bob B.  The same applies for inferring derivation from
used-PE-generated structures - I would not be able to label the used and
generated relationships as allowing inference of derivation because the
connectivity is due to the structure of the PE itself.   

 

Re: provenance looking odd - I think this is easy to find. We email a
contract around and at the end you send me a check (J) that is for an
amount different from what I expect. If I assume you're honest, this
provenance is 'odd' and I would start to look back through the chain of
things that happened to our copies of the contract to find a missing PE
(a disk error changed the amount and we didn't know about that event) or
derivation assumption (we independently calculated the costs based on
different assumptions on which services the bill should be derivedFrom.
IN either case, the result is an improved provenance trace that has
added info about processes or what should/should not be inferred about
derivation that can all be encoded in a new more authoritative account.
This type of analysis is clearly out of scope of the language - we just
record provenance and allow tracking of different accounts -  but it is
likely to be a common use case.

 

Jim

 

> Choosing the defaults to be fully connected would open the door for
incorrect assumptions given an open >world - if you're missing the
'composite' label on a PE or Bob, you may infer derivation where there
isn't any - >these defaults would return the largest potential set of
derivations given current knowledge. I suspect that >this is actually
the right way to 'err' - I'd rather have false positives than to flip
the situation and be unable to >find everything that something truly was
derived from.

I don't agree with this since this is likely to make the knowledge base
inconsistent (given that we are considering only monotonic assertions to
conform to the Semantic Web/RDF). Since, we agreed that PIL is an
assertion language it should require the provenance
application/asserter/user to explicitly assert isDerivedFrom to be
transitive or non-transitive (if they have enough information). 

 

Further, if we consider isDerivedFrom to be transitive by default (and
encode it as such in our formal model), then provenance applications
cannot make it non-transitive subsequently. The solution is to not state
isDerivedFrom to be transitive (or non-transitive) - following the open
world assumption and leave it for application to create specializations
of the property that are transitive (or non-transitive). For example,
fully_connected_isDerivedFrom is a sub property of isDerivedFrom and is
also transitive (similarly for
less-than-fully-connected'/'composite'/'decomposable').

 

> If one inferred a derivation that looked odd, one would simply walk
the chain of Bobs and PEs to see if >there's evidence that one or more
of them might not be fully connected (i.e. one could look in other
accounts) >along with checking to see if one or more of the provenance
statements is simply wrong (A was not an >input!).

I am not sure how would we or an application "know" that the derivation
looks odd and we have to assume that the necessary "Bobs and PE" exist
along with a mechanism for the user or application to traverse that
chain.

 

Received on Saturday, 6 August 2011 17:38:07 UTC