- From: Timothy Lebo <lebot@rpi.edu>
- Date: Mon, 2 Apr 2012 09:25:48 -0400
- To: Curt Tilmes <Curt.Tilmes@nasa.gov>
- Cc: <public-prov-wg@w3.org>
Curt, I listed this example at http://www.w3.org/2011/prov/wiki/PROV_OWL_ontology_component_examples#NASA_reproducing_big_datasets so that hopefully someday it can make it to http://www.w3.org/2011/prov/wiki/PROV_examples Regards, Tim On Apr 2, 2012, at 8:57 AM, Curt Tilmes wrote: > On 04/02/2012 04:33 AM, Tom De Nies wrote: >> I agree with Jim, that option 2 would be the safer option here. >> >> Since we are discussing best practices, I would assume that the best >> practice would be to account for these "unexpected' events. If a >> document is able to change, even when it is not expected to, one should >> always provide the possibility to retain a correct provenance account. >> >> As you said, option 2 retains the correctness of the original account >> provided with :doc, and increments it with the version-specific provenance. >> I think it is indeed a good idea to include this in the primer. > > We've been working on a related use case concerning equivalence > through reproducibility. > > From some input data L0, using activity A1, I derive a new dataset L1v1, > then I do some work with L1v1, analyzing it, using it as model input, > whatever. > > entity(L0) # The input level 0 data > entity(L1v1) # Version 1 of the level 1 data > activity(A1) > used(A1, L0) > wasGeneratedBy(L1v1, A1) > > Then we discover a better way to create L1, so we make a new dataset > L1v2 with a new activity A2. L1v1 was really big, so we delete it. > > entity(L1v2) # Version 2 of the level 1 data > activity(A2) > used(A2, L0) > wasGeneratedBy(L1v2, A2) > > Some people like L1v2, but others question some of the published work > and models that used L1V1, so they reproduce it. > > They try to follow all the the inputs and remake it identically to the > way they did before (not a trivial task), so we end up with L1v1r1 > > entity(L1v1r1) # Reproduction 1 of version 1 of the level 1 data > activity(A3) > used(A3, L0) > wasGeneratedBy(L1v1r1, A3) > > > While L1v2 is different from L1v1 by design (version 2 is a better way > of making it), L1v1r1 is intended to be equivalent to L1v1 (difficult > to prove in the general case, but if we have represented and conveyed > sufficient information about A1, A3 should be our best reproduction of > the generation process). > > > While they are (should be) equivalent in content (assuming we got the > reproduction right), they are certainly distinct entities. > > > Now someone writes a paper describing work based on L1v1, and someone > else writes a paper describing work based on L1v1r1. > > > I want to examine assertions about the two papers to determine if they > are writing about the 'same' dataset. > > In one sense, they are not. L1v1 is not L1v1r1. They were made at > different times by different people, and we might have screwed up > trying to reproduce A1 with A3 so they might actually be very > different. (Like a french translation of an english book might not be > equivalent if the translator screwed up.) > > In another sense, L1v1r1 is intended to be equivalent to L1v1 (if we > are claiming a process is reproducible, it should be possible to > reproduce it.) > > > Is L1v1r1 alternateOf L1v1? > > > Curt > >
Received on Monday, 2 April 2012 13:26:21 UTC