- From: Myers, Jim <MYERSJ4@rpi.edu>
- Date: Sat, 6 Aug 2011 13:37:35 -0400
- To: Satya Sahoo <satya.sahoo@case.edu>
- CC: Luc Moreau <L.Moreau@ecs.soton.ac.uk>, <public-prov-wg@w3.org>
- Message-ID: <B7376F3FB29F7E42A510EB5026D99EF20552A990@troy-be-ex2.win.rpi.edu>
Breaking my reply into two parts... Regarding the default - I understand the concern about inconsistent knowledge, but I think we still need a consistent way for PIL interpreters to work on provenance graphs where there is no assertion about inferability/transitivity - if we just have used-PE-generated structures, should PIL engines infer isDerivedBy and assume transitivity or not? If we do, the assumptions could be incorrect (too many things tagged as derived), if we don't, the assumptions could also be incorrect (too few tagged as derived) - it would be nice though if different engines made worked the same way and would agree on the results. Regarding subclassing isDerivedFrom to be transitive or not - the problem is that whether or not derivation is transitive depends on the nature of the Bobs, not on a quality of the derivedFrom relationship. I could have a Bob B (Birthday Cake) derivedFrom some other one (A - an egg) and then two forward derivations - Bob C derivedFrom B where C is crumbs and Bob D derivedFrom B where D is a used Candle. All of those are valid, but I can't get the correct results (C derivedFrom A but D notDerivedFrom A) by setting flags on the three original derivations. I have to label Bob B. The same applies for inferring derivation from used-PE-generated structures - I would not be able to label the used and generated relationships as allowing inference of derivation because the connectivity is due to the structure of the PE itself. Re: provenance looking odd - I think this is easy to find. We email a contract around and at the end you send me a check (J) that is for an amount different from what I expect. If I assume you're honest, this provenance is 'odd' and I would start to look back through the chain of things that happened to our copies of the contract to find a missing PE (a disk error changed the amount and we didn't know about that event) or derivation assumption (we independently calculated the costs based on different assumptions on which services the bill should be derivedFrom. IN either case, the result is an improved provenance trace that has added info about processes or what should/should not be inferred about derivation that can all be encoded in a new more authoritative account. This type of analysis is clearly out of scope of the language - we just record provenance and allow tracking of different accounts - but it is likely to be a common use case. Jim > Choosing the defaults to be fully connected would open the door for incorrect assumptions given an open >world - if you're missing the 'composite' label on a PE or Bob, you may infer derivation where there isn't any - >these defaults would return the largest potential set of derivations given current knowledge. I suspect that >this is actually the right way to 'err' - I'd rather have false positives than to flip the situation and be unable to >find everything that something truly was derived from. I don't agree with this since this is likely to make the knowledge base inconsistent (given that we are considering only monotonic assertions to conform to the Semantic Web/RDF). Since, we agreed that PIL is an assertion language it should require the provenance application/asserter/user to explicitly assert isDerivedFrom to be transitive or non-transitive (if they have enough information). Further, if we consider isDerivedFrom to be transitive by default (and encode it as such in our formal model), then provenance applications cannot make it non-transitive subsequently. The solution is to not state isDerivedFrom to be transitive (or non-transitive) - following the open world assumption and leave it for application to create specializations of the property that are transitive (or non-transitive). For example, fully_connected_isDerivedFrom is a sub property of isDerivedFrom and is also transitive (similarly for less-than-fully-connected'/'composite'/'decomposable'). > If one inferred a derivation that looked odd, one would simply walk the chain of Bobs and PEs to see if >there's evidence that one or more of them might not be fully connected (i.e. one could look in other accounts) >along with checking to see if one or more of the provenance statements is simply wrong (A was not an >input!). I am not sure how would we or an application "know" that the derivation looks odd and we have to assume that the necessary "Bobs and PE" exist along with a mechanism for the user or application to traverse that chain.
Received on Saturday, 6 August 2011 17:38:07 UTC