Re: Best practice for specialization from Graham Klyne on 2012-04-03 (public-prov-wg@w3.org from April 2012)

From: Graham Klyne <graham.klyne@zoo.ox.ac.uk>
Date: Tue, 03 Apr 2012 10:22:40 +0100
To: Curt Tilmes <Curt.Tilmes@nasa.gov>
CC: public-prov-wg@w3.org
Message-ID: <4F7AC160.8010200@zoo.ox.ac.uk>
 > Is L1v1r1 alternateOf L1v1?

Short answer: No.

Rationale: "equivalent" is not necessarily "same as".  I claim that 
alternativeOf draws upon a strong notion of sameness (identity), but L1v1r1 and 
L1v1 are claimed to be equivalent, not same as.  So we can't claim alternativeOf 
on this basis.

Counter-example: suppose the data contains sensitive private information which 
is leaked.  We want to know how this happened.  (I think this is a reasonable 
provenance use-case.)  Then it may be important to know if it was leaked from 
L1v1r1 or L1v1.  For this purpose, they are very different entities.

#g
--


On 02/04/2012 13:57, Curt Tilmes wrote:
> On 04/02/2012 04:33 AM, Tom De Nies wrote:
>> I agree with Jim, that option 2 would be the safer option here.
>>
>> Since we are discussing best practices, I would assume that the best
>> practice would be to account for these "unexpected' events. If a
>> document is able to change, even when it is not expected to, one should
>> always provide the possibility to retain a correct provenance account.
>>
>> As you said, option 2 retains the correctness of the original account
>> provided with :doc, and increments it with the version-specific provenance.
>> I think it is indeed a good idea to include this in the primer.
>
> We've been working on a related use case concerning equivalence
> through reproducibility.
>
>  From some input data L0, using activity A1, I derive a new dataset L1v1,
> then I do some work with L1v1, analyzing it, using it as model input,
> whatever.
>
> entity(L0) # The input level 0 data
> entity(L1v1) # Version 1 of the level 1 data
> activity(A1)
> used(A1, L0)
> wasGeneratedBy(L1v1, A1)
>
> Then we discover a better way to create L1, so we make a new dataset
> L1v2 with a new activity A2. L1v1 was really big, so we delete it.
>
> entity(L1v2) # Version 2 of the level 1 data
> activity(A2)
> used(A2, L0)
> wasGeneratedBy(L1v2, A2)
>
> Some people like L1v2, but others question some of the published work
> and models that used L1V1, so they reproduce it.
>
> They try to follow all the the inputs and remake it identically to the
> way they did before (not a trivial task), so we end up with L1v1r1
>
> entity(L1v1r1) # Reproduction 1 of version 1 of the level 1 data
> activity(A3)
> used(A3, L0)
> wasGeneratedBy(L1v1r1, A3)
>
>
> While L1v2 is different from L1v1 by design (version 2 is a better way
> of making it), L1v1r1 is intended to be equivalent to L1v1 (difficult
> to prove in the general case, but if we have represented and conveyed
> sufficient information about A1, A3 should be our best reproduction of
> the generation process).
>
>
> While they are (should be) equivalent in content (assuming we got the
> reproduction right), they are certainly distinct entities.
>
>
> Now someone writes a paper describing work based on L1v1, and someone
> else writes a paper describing work based on L1v1r1.
>
>
> I want to examine assertions about the two papers to determine if they
> are writing about the 'same' dataset.
>
> In one sense, they are not. L1v1 is not L1v1r1. They were made at
> different times by different people, and we might have screwed up
> trying to reproduce A1 with A3 so they might actually be very
> different. (Like a french translation of an english book might not be
> equivalent if the translator screwed up.)
>
> In another sense, L1v1r1 is intended to be equivalent to L1v1 (if we
> are claiming a process is reproducible, it should be possible to
> reproduce it.)
>
>
> Is L1v1r1 alternateOf L1v1?
>
>
> Curt
>
Received on Tuesday, 3 April 2012 13:23:00 UTC