Re: Help with Data Preservation BP from Phil Archer on 2016-04-28 (public-dwbp-wg@w3.org from April 2016)

From: Phil Archer <phila@w3.org>
Date: Thu, 28 Apr 2016 15:10:45 +0100
To: Bernadette Farias Lóscio <bfl@cin.ufpe.br>, Annette Greiner <amgreiner@lbl.gov>, "public-dwbp-wg@w3.org" <public-dwbp-wg@w3.org>
Cc: Christophe.Gueret@bbc.co.uk
Message-ID: <572219E5.3080206@w3.org>

OK, I've looked at this as promised. I worked with Christophe on some of 
the original text so I am familiar with it. Even so, I have pretty much 
rewritten it and these changes definitely need WG approval/consideration 
- they're way more than editorial.

1. Section intro rewritten to say that we're concerned here with what's 
left after a dataset has been removed or archived.

2. BP on preserving identifiers moved to be first in the section. It 
distinguishes between datasets that have been deleted entirely (410) and 
those that have been archived (303).

3. Slightly amended the BP on assessing coverage.

4. Deleted the one on trusted serialisation since a) it is contentious 
b) I think it's questionable whether it is in scope.

See revised text at 
http://philarcher1.github.io/dwbp/bp.html#dataPreservation

And the Diff at

http://services.w3.org/htmldiff?doc1=http%3A%2F%2Fw3c.github.io%2Fdwbp%2Fbp.html&doc2=http%3A%2F%2Fphilarcher1.github.io%2Fdwbp%2Fbp.html#dataPreservation

Pull request issued.

HTH

Phil.


On 26/04/2016 21:34, Bernadette Farias Lóscio wrote:
> Hi all,
>
> We had a lot of discussions about Data Preservation Best Practices, but we
> still have some open comments that we should try to solve before the next
> publication. For this, we need some help :)
>
> I am copying Christophe Gueret (he was in charge of the Data Preservation
> section) on this email and I hope he can help us to resolve these comments
> ;)
>
>
> --> Best Practice 28: Assess dataset coverage
>
> Bernadette's comment:
>
> The test of this BP is not a real test.
> I think it should be something like this:
> "Check if all resources used in the dataset are either already preserved
> somewhere or provided along with the  dataset."
>
>
> --> Best Practice 29: Use a trusted serialization format for preserved data
> dumps
>
> Annette's comment:
>
> "If we keep this, it should at least offer JSON as an acceptable example.
> JSON is the current overwhelming standard for APIs. This talks about
> "sending data dumps for long-term preservation" and "data depositors".
> Where are the data being sent? Is it on the Web? The bad example would pass
> the How to Test."
>
> Bernadette's comment:
>
> I am not sure if we need this BP. If we are talking about preservation of
> Data on the Web, then probably the data is already in a standard
> machine-readable format (BP13). In this case, why (or when) do we need this
> BP?
>
> --> Best Practice 30: Update the status of identifiers
>
> Annette's comment:
>
> "It's not quite clear what we are suggesting get linked to what. The Why
> talks about linking preserved datasets with the original URI. Are we saying
> the original URI should continue to point to the preserved dataset? If
> that's the case, then what does preservation mean? There is also discussion
> of saving snapshots as versions, which seems to me is covered better under
> versioning.
>
> We say "A link is maintained between the URI of a resource, the most
> up-to-date description available for it, and preserved descriptions." One
> link can only join two resources. Should people preserve old descriptions?
> Maybe descriptions of older versions are what was meant?
>
> A 410 status only makes sense if there's nothing served at the URI, which
> isn't the case if the advice here is followed. 303 seems like a good
> option."
>
> kind regards,
> Bernadette
>
>
>
>

-- 


Phil Archer
W3C Data Activity Lead
http://www.w3.org/2013/data/

http://philarcher.org
+44 (0)7887 767755
@philarcher1

Received on Thursday, 28 April 2016 14:10:51 UTC