RE: Table of Contents DWBP from Manuel.CARRASCO-BENITEZ@ec.europa.eu on 2014-11-05 (public-dwbp-wg@w3.org from November 2014)

From: <Manuel.CARRASCO-BENITEZ@ec.europa.eu>
Date: Wed, 5 Nov 2014 15:51:21 +0000
To: <christophe.gueret@dans.knaw.nl>
CC: <bfl@cin.ufpe.br>, <aisaac@few.vu.nl>, <public-dwbp-wg@w3.org>
Message-ID: <39DB516E46C0E842A2CFFF1BBB7412F15F86AD72@S-DC-ESTF03-B.net1.cec.eu.int>
Christophe,

As you rightly commented, we already discussed this, but it continuous to come back :-)

In a nutshell:

- Data preservation  : the group is about data and this subject is unavoidable  
- Scope                        : one has to define what aspects on data preservation are in scope and out of scope
- Online/offline         : one should consider online data (http) offline data (file) - related to static/dynamic data and the data preservation 
- COMURI                  : compact, mnemonic URI that support whatever the WG decises
- 301                           : just an example -  "... appropriate status codes  ..."  - redirection is out of scope for COMURI
- Format                     : be conservative - https://www.ietf.org/rfc/rfc1.txt  :-)

Regards
Tomas

From: Christophe Guéret [mailto:christophe.gueret@dans.knaw.nl] 
Sent: Wednesday, November 05, 2014 3:43 PM
To: CARRASCO BENITEZ Manuel (DGT)
Cc: bfl@cin.ufpe.br; Christophe Gueret; aisaac@few.vu.nl; public-dwbp-wg@w3.org
Subject: Re: Table of Contents DWBP

Hoi Tomas,
Yes, we already discussed that. This IETF document rightly describes challenges archives face but this is, I agree, out of our scope. For instance, the fact that preferred serialisation formats evolve over time and that what is preserved now as JSON-LD could be asked as something else in 20 years from now is none of our business (IMHO). But still we could emit some recommendations on how to best ship data to an archive so that we make their job easier. We could also have some recommendations about what an archive should do to make the data easier to find and re-use. Regarding the last bit, indexing all the subjects in a given data set could be a good thing to do as future data consumers are likely to look for a preserved description of some URI in particular...
The URI patterns in 7.5 are ok but this section could be revised to allow for other codes than only 301 and also to have examples using the persistent identifier (URI preferably) assigned to a given data dump at ingest time.
Regards,
Christophe



BTW there is a typo in "Redirectio services", section 7.4 of http://dragoman.org/comuri.html#nature-of-the-resources


On 4 November 2014 15:29, Manuel.CARRASCO-BENITEZ@ec.europa.eu <Manuel.CARRASCO-BENITEZ@ec.europa.eu> wrote:
Dear all,

I previously commented, data preservation is a complex issued and a IETF WG worked for a few year:
  Long-Term Archive Service Requirements - http://www.ietf.org/rfc/rfc4810.txt


It is discussed in COMURI, though data preservation itself is out of scope: perhaps I am too aware of the complexities :-)

A few extracts -   http://dragoman.org/comuri.html#nature-of-the-resources


     7. Nature of the resources

     7.1 Ultrapersistent URI
     URI creation has to take into account all identification scenarios: original site, archival sites, and offline data;

     7.5 Data archival
    The following data archiving techniques are considered:
         Online archival sites
         Offline archival
        Pack

Regards
Tomas

From: Bernadette Farias Lóscio [mailto:bfl@cin.ufpe.br]
Sent: Friday, October 24, 2014 4:32 PM
To: Christophe Guéret
Cc: Antoine Isaac; public-dwbp-wg@w3.org
Subject: Re: Table of Contents DWBP

Hello Cristophe,

Thanks for your feedback!

Just to let me know if I understood your point... Are you proposing to add Data Archival as new phase on the life cycle? 

I think it is also important to discuss the difference between Data Preservation and Data Archival. Could you please let me know what is your understanding about these two concepts? 

Thank you!
Bernadette

2014-10-24 11:13 GMT-03:00 Christophe Guéret <christophe.gueret@dans.knaw.nl>:
Dear Caroline, all,
Looking at the documents, I'd like to suggest to add a section "Data archival" in the "Best Practices Themes (challenges)" of [1]. Then the two points
• Preservation
• Data versioning
currently found in data publication could be moved there.

The idea is there is that publication and archival are two different phases of the work-flow.
We can add versioning to the later arguing that one would like to preserve previous versions and only serve the latest.

Christophe

[1] https://www.w3.org/2013/dwbp/wiki/Proposed_structure


On 24 October 2014 16:04, Antoine Isaac <aisaac@few.vu.nl> wrote:
Dear Caroline, all,

As requested in today's call, I had a brief look at the "Draft of the content structure of the Best Practices Themes" and "Description of each theme on the Table of Contents" at [1].
I see that

There is the section on controlled vocabularies, that is being worked on by Mark and I and mentioned in the previous content list at [2]. Is it intentionally left out?

In case not, I think it could be in the "data organization" section of the proposed structure at [1].

Kind regards

Antoine

[1]https://www.w3.org/2013/dwbp/wiki/Proposed_structure

[2]https://www.w3.org/2013/dwbp/wiki/Main_Page#Best_Practices



On 10/21/14 10:13 PM, Caroline Burle wrote:
> Ghislain,
>
> thank you for your comments, the suggestion to add different questions/issues is very welcome! In fact, Bernadette, Newton and I had a 2h call yesterday and put some of the questions you suggested on the Proposed Structure[1].
>
> Phil Archer also gave the input to add “Feedback” as an item, so the Data on the Web Lifecycle would be actually a cycle. This is on the Wiki also.
>
> Furthermore, we added on the TPAC Goals[2]:
> Description of each theme on the Table of Contents
> Draft of the content structure of the Best Practices Themes
>
> Kind regards,
> Caroline
>
> [1] https://www.w3.org/2013/dwbp/wiki/Proposed_structure#Mapping_of_Themes.5B1.5D

> [2] https://www.w3.org/2013/dwbp/wiki/TPAC_2014

>
>
> Em 17/10/14 10:00, Ghislain Atemezing escreveu:
>> Hi Caroline,
>> Thanks for this starting document for BP document structure.
>>
>>> Bernadette and I edited the TPAC Deliverable Goals.
>>>
>>> We have also edited the Proposed Structure of the BP document [2]. We have only started discussing the Table of Contents, but it would be great if you may take a look and make comments.
>>
>> I would suggest to add for each item different questions /issues that we might address to be sure that we capture all the requirements.
>> Find below a first attempt of what I mean..
>>
>> ###################################
>>  1- Data Publisher
>>          Metadata: What are the minimum metadata to describe a dataset?
>>             Licenses: How to identify licenses suitable to a dataset ?
>>             Data quality: How can publishers monitor qualities of their datasets?
>>             Provenance: What type of provenance information to attach at metadata level ? Discuss the granularity of PROV data: either meta of fine-grained level.
>>         Interoperability
>>             What makes a good interoperable datasets ?
>>       Data access: How to decide if to publish a dump versus API options (SPARQL, etc) ? What requirements to take into account ? (reliability, time to query dataset, etc.)
>>             Data formats: Advices for types of formats to publish dataset (at least 1/2 stars compliance ? )
>>             Data granularity: How to publish catalog versus otro type of data ?
>>         Sensitive data (privacy) : How to identify them? Are they worth publishing? What are the security mechanisms? Licenses ?
>>         Data identification: How could a publisher add identify related dataset for interconnection or reuse?
>>         Persistence (data identification?): What rules to take into account when releasing a dataset:
>>             URI ?
>>             Status of the dataset ?
>>             Disclaimer ?
>>             etc..
>>         Data versioning: How to make a describe a track the versions of dataset ?
>>
>>
>>     2- Data Consumer
>>         Data usage : Models to annotate different applications of datasets (e.g., data visualizations, data summarization, data republishing, )
>>
>> ######################
>>
>> WDYT ?
>>
>> Cheers,
>>
>> Ghislain
>



--
Onderzoeker
+31(0)6 14576494
christophe.gueret@dans.knaw.nl

Data Archiving and Networked Services (DANS)
DANS bevordert duurzame toegang tot digitale onderzoeksgegevens. Kijk op www.dans.knaw.nl voor meer informatie. DANS is een instituut van KNAW en NWO.

Let op, per 1 januari hebben we een nieuw adres: 
DANS | Anna van Saksenlaan 51 | 2593 HW Den Haag | Postbus 93067 | 2509 AB Den Haag | +31 70 349 44 50 | info@dans.knaw.nl | www.dans.knaw.nl

Let's build a World Wide Semantic Web!
http://worldwidesemanticweb.org/


e-Humanities Group (KNAW)




--
Bernadette Farias Lóscio
Centro de Informática
Universidade Federal de Pernambuco - UFPE, Brazil
----------------------------------------------------------------------------



-- 
Onderzoeker
+31(0)6 14576494
christophe.gueret@dans.knaw.nl

Data Archiving and Networked Services (DANS)
DANS bevordert duurzame toegang tot digitale onderzoeksgegevens. Kijk op www.dans.knaw.nl voor meer informatie. DANS is een instituut van KNAW en NWO.

Let op, per 1 januari hebben we een nieuw adres: 
DANS | Anna van Saksenlaan 51 | 2593 HW Den Haag | Postbus 93067 | 2509 AB Den Haag | +31 70 349 44 50 | info@dans.knaw.nl | www.dans.knaw.nl

Let's build a World Wide Semantic Web!
http://worldwidesemanticweb.org/


e-Humanities Group (KNAW)
Received on Wednesday, 5 November 2014 15:51:51 UTC