- From: Tom De Nies <tom.denies@ugent.be>
- Date: Mon, 2 Apr 2012 16:00:23 +0200
- To: Curt Tilmes <Curt.Tilmes@nasa.gov>
- Cc: public-prov-wg@w3.org
- Message-ID: <CA+=hbbd5gjcr2=KykT=EC80KDWbFMQ-+sjpOWGMSRDDPRTcVQQ@mail.gmail.com>
> Is L1v1r1 alternateOf L1v1? > I could be wrong, but I would say: no, it is not, since they are not a * specialization* of the same entity. In the way that you presented your view on the provenance, they are only * derived* from the same entity, since you have the activities A1 and A3 using L0 and generating L1v1 and L1v1r1. So i think you can assert wasDerivedFrom(L1v1,L0) and wasDerivedFrom(L1v1r1, L0), but not alternateOf(L1V1,L1V1r1). To support this further, Sam came up with a good counterexample for these being alternate. If L0 consists of person-location information, and L1V1 only retains the persons, and L1V1r1 only retains the locations, both datasets are derived from L0, but they are by no means alternates of each other. However, one could imagine to provide a sort of "conceptual view", and introduce an entity: entity(inputDataForAlgorithmA) Then, together with specializationOf(L1V1, inputDataForAlgorithmA) specializationOf(L1V1r1, inputDataForAlgorithmA) you could say that alternateOf(L1V1,L1v1r1) because they are both specializations of the input data. So to sum up, I guess it depends on the granularity and the angle at which you view the provenance. In the literal case, they do not seem alternates to me, yet when looking at the conceptual representation, they are. Or is this introducing too much semantics to the constraints? Regards, Tom --- Tom De Nies Ghent University - IBBT Faculty of Engineering and Architecture Department of Electronics and Information Systems - Multimedia Lab Gaston Crommenlaan 8 bus 201, B-9050 Ledeberg-Ghent, Belgium t: +32 9 331 49 59 e: tom.denies@ugent.be URL: http://multimedialab.elis.ugent.be 2012/4/2 Curt Tilmes <Curt.Tilmes@nasa.gov> > On 04/02/2012 04:33 AM, Tom De Nies wrote: > >> I agree with Jim, that option 2 would be the safer option here. >> >> Since we are discussing best practices, I would assume that the best >> practice would be to account for these "unexpected' events. If a >> document is able to change, even when it is not expected to, one should >> always provide the possibility to retain a correct provenance account. >> >> As you said, option 2 retains the correctness of the original account >> provided with :doc, and increments it with the version-specific >> provenance. >> I think it is indeed a good idea to include this in the primer. >> > > We've been working on a related use case concerning equivalence > through reproducibility. > > From some input data L0, using activity A1, I derive a new dataset L1v1, > then I do some work with L1v1, analyzing it, using it as model input, > whatever. > > entity(L0) # The input level 0 data > entity(L1v1) # Version 1 of the level 1 data > activity(A1) > used(A1, L0) > wasGeneratedBy(L1v1, A1) > > Then we discover a better way to create L1, so we make a new dataset > L1v2 with a new activity A2. L1v1 was really big, so we delete it. > > entity(L1v2) # Version 2 of the level 1 data > activity(A2) > used(A2, L0) > wasGeneratedBy(L1v2, A2) > > Some people like L1v2, but others question some of the published work > and models that used L1V1, so they reproduce it. > > They try to follow all the the inputs and remake it identically to the > way they did before (not a trivial task), so we end up with L1v1r1 > > entity(L1v1r1) # Reproduction 1 of version 1 of the level 1 > data > activity(A3) > used(A3, L0) > wasGeneratedBy(L1v1r1, A3) > > > While L1v2 is different from L1v1 by design (version 2 is a better way > of making it), L1v1r1 is intended to be equivalent to L1v1 (difficult > to prove in the general case, but if we have represented and conveyed > sufficient information about A1, A3 should be our best reproduction of > the generation process). > > > While they are (should be) equivalent in content (assuming we got the > reproduction right), they are certainly distinct entities. > > > Now someone writes a paper describing work based on L1v1, and someone > else writes a paper describing work based on L1v1r1. > > > I want to examine assertions about the two papers to determine if they > are writing about the 'same' dataset. > > In one sense, they are not. L1v1 is not L1v1r1. They were made at > different times by different people, and we might have screwed up > trying to reproduce A1 with A3 so they might actually be very > different. (Like a french translation of an english book might not be > equivalent if the translator screwed up.) > > In another sense, L1v1r1 is intended to be equivalent to L1v1 (if we > are claiming a process is reproducible, it should be possible to > reproduce it.) > > > Is L1v1r1 alternateOf L1v1? > > > Curt > >
Received on Monday, 2 April 2012 14:00:59 UTC