- From: Curt Tilmes <Curt.Tilmes@nasa.gov>
- Date: Mon, 2 Apr 2012 08:57:41 -0400
- To: <public-prov-wg@w3.org>
On 04/02/2012 04:33 AM, Tom De Nies wrote: > I agree with Jim, that option 2 would be the safer option here. > > Since we are discussing best practices, I would assume that the best > practice would be to account for these "unexpected' events. If a > document is able to change, even when it is not expected to, one should > always provide the possibility to retain a correct provenance account. > > As you said, option 2 retains the correctness of the original account > provided with :doc, and increments it with the version-specific provenance. > I think it is indeed a good idea to include this in the primer. We've been working on a related use case concerning equivalence through reproducibility. From some input data L0, using activity A1, I derive a new dataset L1v1, then I do some work with L1v1, analyzing it, using it as model input, whatever. entity(L0) # The input level 0 data entity(L1v1) # Version 1 of the level 1 data activity(A1) used(A1, L0) wasGeneratedBy(L1v1, A1) Then we discover a better way to create L1, so we make a new dataset L1v2 with a new activity A2. L1v1 was really big, so we delete it. entity(L1v2) # Version 2 of the level 1 data activity(A2) used(A2, L0) wasGeneratedBy(L1v2, A2) Some people like L1v2, but others question some of the published work and models that used L1V1, so they reproduce it. They try to follow all the the inputs and remake it identically to the way they did before (not a trivial task), so we end up with L1v1r1 entity(L1v1r1) # Reproduction 1 of version 1 of the level 1 data activity(A3) used(A3, L0) wasGeneratedBy(L1v1r1, A3) While L1v2 is different from L1v1 by design (version 2 is a better way of making it), L1v1r1 is intended to be equivalent to L1v1 (difficult to prove in the general case, but if we have represented and conveyed sufficient information about A1, A3 should be our best reproduction of the generation process). While they are (should be) equivalent in content (assuming we got the reproduction right), they are certainly distinct entities. Now someone writes a paper describing work based on L1v1, and someone else writes a paper describing work based on L1v1r1. I want to examine assertions about the two papers to determine if they are writing about the 'same' dataset. In one sense, they are not. L1v1 is not L1v1r1. They were made at different times by different people, and we might have screwed up trying to reproduce A1 with A3 so they might actually be very different. (Like a french translation of an english book might not be equivalent if the translator screwed up.) In another sense, L1v1r1 is intended to be equivalent to L1v1 (if we are claiming a process is reproducible, it should be possible to reproduce it.) Is L1v1r1 alternateOf L1v1? Curt
Received on Monday, 2 April 2012 12:58:17 UTC