Re: General feedback on the document

Hello Christophe,

Thanks a lot for your comments on the FPWD of the DWBP document! After
gathering some feedback from the community some changes were made and we're
planning to publish a 2nd draft [1].

In the following, you can find some comments about your feedback on the
FPWD.


> # Overall points
> The document concerns more data publishers than it concerns consumers.
> This also seems to be reflected by the composition of editors/contributors,
> there should be more data consumers jumping in and adding BPs that matter
> to them.
> "Data must be available in machine readable" -> only should, must is way
> too strong. Some data consumers may want to have access to data that is not
> machine readable (e.g. scanned old document) and not being only restricted
> to their machine-translated counterparts (e.g. OCRed old document)
>

During the discussions about the audience, the group agreed that publishers
will be our primary audience. In this case, best practices  should be
employed by data publishers instead of data consumers. However, both
publishers and consumers will benefit from this. Then, I suggest to keep
publishers as the main primary audience for our BP.

Concerning the "Data must be available in machine readable", It was changed
for "should".


> # Data vocabularies
> Issue 9 : we should stick to using "vocabularies"
> Issue 10 : we should aim at being generic
> BP 19: there is a problem in advocating for simplicity as this can prevent
> from having rich vocabularies. It could instead be suggest that publishers
> may provide vocabularies as rich as needed but strive at basing them on
> "simpler" ones (e.g. core ontologies / upper ontologies / ... ) to ensure
> there is always a minimum level of understanding. See, e.g.
> http://arxiv.org/abs/1304.5743 for a discussion about this.
>

There is an ongoing discussion about the Data Vocabularies section [2]. I
 propose to postpone this discussion for the next draft of the DWBP
document.


>
> # Preservation
> There are existing guidelines about the process of preservation itself.
> Those could be cited to guide people on how to do preservation. There is
> also a lot of repositories that exist to preserve data at different levels
> (institution, national, ...).
> There should be something there! In terms of BPs, the following points
> should be addressed:
> * As a data publisher, do you want to, or have to, preserve your data ?
> * If yes, what to preserve ?
> * Who to give it to ? Only to one archive or several ? One could be
> mandated to do preservation whatever is quality as an archive is. There are
> existing certifications (DSA, etc) that can be used to help publishers make
> informed choices about who to trust.
> * Think about the level of access for the preserved copy (public, private,
> ...)
> * The type of data matter for preservation. Publishers need to be aware of
> that. It is also important to think about preserving with context and thus
> push not only a dataset alone but also preserve the resources that are
> needed to make sense of it (documentation, schemas, ...)
>

A data preservation section was included in the document [3].

>
> # Feedback
> This section should also relate to preservation. One way to do it is to
> list stakeholders around preservation (see RDA for an impression).
> BP: there should be identifiers to give feedback on a specific part of the
> data
> BP: Use feedback as data enrichment, e.g. crowd annotation
>

I propose to keep this discussion for the Dataset Usage Vocabulary document
[4].

>
> # Metadata
> Need to say where the taxonomy comes from. The document speaks about 3
> types instead of the 5 commonly observed. The two missing ones are
> preservation metadata (how, where, ...) and technical metadata (EXIF,...)
> BP: Use standard terms but then make extensions public when they are needed
>

Could you please send me the reference for this taxonomy?

# Data quality
> Does this applies to data or metadata ?
> There is a lot of granularity aspects in data that need to be taken in
> account
> How do you define quality ?
> Completeness of the data is not related to quality. There should be an
> element of comparison to check the completeness against something (e.g.
> "data is complete according to EDM")
> There should be something about Quality VS Usability, partly because
> fitting data into quality standards can lead to loosing important data
> (mainly everything that does not fit)
>

I suggest to keep this discussion (the meaning of quality, granularity and
completeness) for the Quality   Vocabulary [5].

Please let me know if this is ok for you!

kind regards,
Bernadette

[1] http://w3c.github.io/dwbp/bp.html
[2] http://www.w3.org/2013/dwbp/track/issues/166
[3] http://w3c.github.io/dwbp/bp.html#dataPreservation
[4] http://w3c.github.io/dwbp/vocab-du.html
[5] http://w3c.github.io/dwbp/vocab-dqg.html






> Cheers,
> Christophe
>
> --
> Onderzoeker
> +31(0)6 14576494
> christophe.gueret@dans.knaw.nl
>
> *Data Archiving and Networked Services (DANS)*
>
> DANS bevordert duurzame toegang tot digitale onderzoeksgegevens. Kijk op
> www.dans.knaw.nl voor meer informatie. DANS is een instituut van KNAW en
> NWO.
>
>
> Let op, per 1 januari hebben we een nieuw adres:
>
> DANS | Anna van Saksenlaan 51 | 2593 HW Den Haag | Postbus 93067 | 2509 AB
> Den Haag | +31 70 349 44 50 | info@dans.knaw.nl <info@dans.kn> |
> www.dans.knaw.nl
>
>
> *Let's build a World Wide Semantic Web!*
> http://worldwidesemanticweb.org/
>
> *e-Humanities Group (KNAW)*
> [image: eHumanities] <http://www.ehumanities.nl/>
>



-- 
Bernadette Farias Lóscio
Centro de Informática
Universidade Federal de Pernambuco - UFPE, Brazil
----------------------------------------------------------------------------

Received on Thursday, 11 June 2015 14:11:42 UTC