- From: Graham Klyne <graham.klyne@zoo.ox.ac.uk>
- Date: Tue, 03 Apr 2012 10:22:40 +0100
- To: Curt Tilmes <Curt.Tilmes@nasa.gov>
- CC: public-prov-wg@w3.org
> Is L1v1r1 alternateOf L1v1? Short answer: No. Rationale: "equivalent" is not necessarily "same as". I claim that alternativeOf draws upon a strong notion of sameness (identity), but L1v1r1 and L1v1 are claimed to be equivalent, not same as. So we can't claim alternativeOf on this basis. Counter-example: suppose the data contains sensitive private information which is leaked. We want to know how this happened. (I think this is a reasonable provenance use-case.) Then it may be important to know if it was leaked from L1v1r1 or L1v1. For this purpose, they are very different entities. #g -- On 02/04/2012 13:57, Curt Tilmes wrote: > On 04/02/2012 04:33 AM, Tom De Nies wrote: >> I agree with Jim, that option 2 would be the safer option here. >> >> Since we are discussing best practices, I would assume that the best >> practice would be to account for these "unexpected' events. If a >> document is able to change, even when it is not expected to, one should >> always provide the possibility to retain a correct provenance account. >> >> As you said, option 2 retains the correctness of the original account >> provided with :doc, and increments it with the version-specific provenance. >> I think it is indeed a good idea to include this in the primer. > > We've been working on a related use case concerning equivalence > through reproducibility. > > From some input data L0, using activity A1, I derive a new dataset L1v1, > then I do some work with L1v1, analyzing it, using it as model input, > whatever. > > entity(L0) # The input level 0 data > entity(L1v1) # Version 1 of the level 1 data > activity(A1) > used(A1, L0) > wasGeneratedBy(L1v1, A1) > > Then we discover a better way to create L1, so we make a new dataset > L1v2 with a new activity A2. L1v1 was really big, so we delete it. > > entity(L1v2) # Version 2 of the level 1 data > activity(A2) > used(A2, L0) > wasGeneratedBy(L1v2, A2) > > Some people like L1v2, but others question some of the published work > and models that used L1V1, so they reproduce it. > > They try to follow all the the inputs and remake it identically to the > way they did before (not a trivial task), so we end up with L1v1r1 > > entity(L1v1r1) # Reproduction 1 of version 1 of the level 1 data > activity(A3) > used(A3, L0) > wasGeneratedBy(L1v1r1, A3) > > > While L1v2 is different from L1v1 by design (version 2 is a better way > of making it), L1v1r1 is intended to be equivalent to L1v1 (difficult > to prove in the general case, but if we have represented and conveyed > sufficient information about A1, A3 should be our best reproduction of > the generation process). > > > While they are (should be) equivalent in content (assuming we got the > reproduction right), they are certainly distinct entities. > > > Now someone writes a paper describing work based on L1v1, and someone > else writes a paper describing work based on L1v1r1. > > > I want to examine assertions about the two papers to determine if they > are writing about the 'same' dataset. > > In one sense, they are not. L1v1 is not L1v1r1. They were made at > different times by different people, and we might have screwed up > trying to reproduce A1 with A3 so they might actually be very > different. (Like a french translation of an english book might not be > equivalent if the translator screwed up.) > > In another sense, L1v1r1 is intended to be equivalent to L1v1 (if we > are claiming a process is reproducible, it should be possible to > reproduce it.) > > > Is L1v1r1 alternateOf L1v1? > > > Curt >
Received on Tuesday, 3 April 2012 13:23:00 UTC