- From: Christophe Guéret <christophe.gueret@dans.knaw.nl>
- Date: Thu, 5 Feb 2015 08:18:13 +0100
- To: "contact@carlosiglesias.es" <contact@carlosiglesias.es>
- CC: Christophe Gueret <christophe.gueret@dans.knaw.nl>, Public DWBP WG <public-dwbp-wg@w3.org>
- Message-ID: <CABP9CAHuBwkBHpb90hFBGgOz4SsvUZV2xbos=7jxR613CTvmiw@mail.gmail.com>
Dear Carlos, Thanks again for your comments. I've now changed the topic of the thread so that the tracker could pick this up and append that to the related issue. DATA PRESERVATION >>> >> > > I feel quite uncomfortable with this section in general. I have some >>> problems trying to understand the underlying principles for this BPs, but >>> overall it looks to be about data archiving generally speaking instead >>> about data persistence that is indeed the best practice IMO and also >>> coherent with other BPs in the document (such as versioning). In fact data >>> archiving looks more like a bad practice for me than a best one. >>> >> > Thanks for having looked at these BPs. I think data persistence and data >> preservation are two different issues and I can't agree that data archiving >> is a bad practice. The bad practice is that people that want to take data >> off-line for some reason, say the end of a project funding, just leave the >> server running until it dies out or trash the data. In these cases sending >> the data to an archive is a good and better practice. It is even >> increasingly backed by funding agencies that ask funded projects to come up >> with a data management plan that includes a section about what will happen >> to the data at the end of the project. Web data should make no exception to >> this (IMHO). >> > > I'm sorry for keep disagreeing here, but (1) if that's the scenario we > would like to cover I think it is not properly described in the document as > currently and (2) still this looks like a sort of least bad option, not a > best practice, no? > I understand and agree with (1) but could go along with the wording you use for (2). We have to reach a consensus and produce a coherent document. If this section on preservation does not fit into the rest of the story and/or is not to the like of the majority of the editors and contributors then we should drop it. > So these BPs are here to help people decide on what to best ship their >> data to an archive when taking it off the Web. That's not to say these BPs >> are the good ones, nor that this list is exhaustive, but I would very much >> like us to keep a section about data preservation in the document and have >> a discussion about its content. >> > > > Good, now I have a better understanding of the purpose, but still I think > that's not a best practice. The best practice for data on the web should be > just not leaving data die and not moving it around IMO (specially if it > will be offline or in a packaged where all links and references will be > broken since then). > That's true, just because all the data on the web is interconnected and because the meaning of everything depends of the meaning of everything else nobody should delete or update anything to avoid side-effects. The problem is that will happen anyway and this is why something is needed. For the Web documents we have the Web Archive that prove to be useful at time. Some are proposing that we should have a similar system for Web Data, having digital preservation institutes go out on the Web and store everything. I don't whether if we see a need for having access to historical descriptions of entities we should offer some BPs related to it. This would then have to cover two aspects: * How to consumers can get access to past descriptions of entities which are no longer associated to the identifier ? * How can publishers prepare their data in order to make it easier for consumers to achieve this first goal ? Versioning is solution to this. Offering dumps is another. Keeping resources alive and linking them to historical dumps is another one. See for instance at what DBpedia is doing with Memento. URIs are cool and preserved descriptions are accessible via the memento gateway. They also have dumps in recognized serialisation formats on their site. This are good practices. They could also have gone for versioning but did not choose to put the DBpedia version number in the resource names, and that's fine too. Now, all the part about doing monitoring for file format obsolescence, preventing bitrot, having copies of dumps on different storages, monitoring for quality at ingest time, be sure the storage of the data is OAIS compliant, etc. All of that is the job of a digital trusted repository. Not the one of the data owner nor the job of the data consumer. So, as Tomas argued several time, these aspects are surely *out* of scope. Hope that will help discussing that further. I can also propose we spend a significant time on this specific issue at one of the upcoming meetings to take a final decision about this issue. Cheers, Christophe -- --- Carlos Iglesias. Internet & Web Consultant. +34 687 917 759 contact@carlosiglesias.es @carlosiglesias http://es.linkedin.com/in/carlosiglesiasmoro/en > -- Onderzoeker +31(0)6 14576494 christophe.gueret@dans.knaw.nl *Data Archiving and Networked Services (DANS)* DANS bevordert duurzame toegang tot digitale onderzoeksgegevens. Kijk op www.dans.knaw.nl voor meer informatie. DANS is een instituut van KNAW en NWO. Let op, per 1 januari hebben we een nieuw adres: DANS | Anna van Saksenlaan 51 | 2593 HW Den Haag | Postbus 93067 | 2509 AB Den Haag | +31 70 349 44 50 | info@dans.knaw.nl <info@dans.kn> | www.dans.knaw.nl *Let's build a World Wide Semantic Web!* http://worldwidesemanticweb.org/ *e-Humanities Group (KNAW)* [image: eHumanities] <http://www.ehumanities.nl/>
Received on Thursday, 5 February 2015 07:19:04 UTC