- From: Christophe Guéret <christophe.gueret@dans.knaw.nl>
- Date: Fri, 6 Mar 2015 14:22:40 +0100
- To: <public-dwbp-comments@w3.org>
- CC: <jtennis@uw.edu>, <smiragli@uwm.edu>, Aida Slavic <aida.slavic@udcc.org>, Almila Akdag Salah <alelma@gmail.com>, Albert Meroño Peñuela <albert.meronyo@gmail.com>, <toby.burrows@uwa.edu.au>, <valentine.charles@europeana.eu>, Henk van den Berg <henk.van.den.berg@dans.knaw.nl>, <kzervanou@yahoo.co.uk>, Rob Koopman <Rob.Koopman@oclc.org>, Windhouwer Menzo <menzo@windhouwer.nl>, Shenghui Wang <shenghui.wang@gmail.com>, Andrea Scharnhorst <andrea.scharnhorst@dans.knaw.nl>, <cristina.bucur@student.vu.nl>
- Message-ID: <CABP9CAHZ9e3OgdQQT-X5EZR_6z6=ZFA1w7X7UMksQRAzb3LUoQ@mail.gmail.com>
Dear DWBP group, Yesterday and the day before I was sitting next to KOS experts for this workshop: http://knowescape.org/evolution-and-variation-of-classification-systems-knowescape-workshop-march-4-5-2015-amsterdam/ We used my speaking slot to have a look at http://www.w3.org/TR/2015/WD-dwbp-20150224/ and provide some comments. These comments are, hopefully faithfully, reproduced in that follows. Everyone that attended the event is also CCed in this mail and may jump in to correct things when needed, or further comment. # Overall points The document concerns more data publishers than it concerns consumers. This also seems to be reflected by the composition of editors/contributors, there should be more data consumers jumping in and adding BPs that matter to them. "Data must be available in machine readable" -> only should, must is way too strong. Some data consumers may want to have access to data that is not machine readable (e.g. scanned old document) and not being only restricted to their machine-translated counterparts (e.g. OCRed old document) # Data vocabularies Issue 9 : we should stick to using "vocabularies" Issue 10 : we should aim at being generic BP 19: there is a problem in advocating for simplicity as this can prevent from having rich vocabularies. It could instead be suggest that publishers may provide vocabularies as rich as needed but strive at basing them on "simpler" ones (e.g. core ontologies / upper ontologies / ... ) to ensure there is always a minimum level of understanding. See, e.g. http://arxiv.org/abs/1304.5743 for a discussion about this. # Preservation There are existing guidelines about the process of preservation itself. Those could be cited to guide people on how to do preservation. There is also a lot of repositories that exist to preserve data at different levels (institution, national, ...). There should be something there! In terms of BPs, the following points should be addressed: * As a data publisher, do you want to, or have to, preserve your data ? * If yes, what to preserve ? * Who to give it to ? Only to one archive or several ? One could be mandated to do preservation whatever is quality as an archive is. There are existing certifications (DSA, etc) that can be used to help publishers make informed choices about who to trust. * Think about the level of access for the preserved copy (public, private, ...) * The type of data matter for preservation. Publishers need to be aware of that. It is also important to think about preserving with context and thus push not only a dataset alone but also preserve the resources that are needed to make sense of it (documentation, schemas, ...) # Feedback This section should also relate to preservation. One way to do it is to list stakeholders around preservation (see RDA for an impression). BP: there should be identifiers to give feedback on a specific part of the data BP: Use feedback as data enrichment, e.g. crowd annotation # Metadata Need to say where the taxonomy comes from. The document speaks about 3 types instead of the 5 commonly observed. The two missing ones are preservation metadata (how, where, ...) and technical metadata (EXIF,...) BP: Use standard terms but then make extensions public when they are needed # Data quality Does this applies to data or metadata ? There is a lot of granularity aspects in data that need to be taken in account How do you define quality ? Completeness of the data is not related to quality. There should be an element of comparison to check the completeness against something (e.g. "data is complete according to EDM") There should be something about Quality VS Usability, partly because fitting data into quality standards can lead to loosing important data (mainly everything that does not fit) Cheers, Christophe -- Onderzoeker +31(0)6 14576494 christophe.gueret@dans.knaw.nl *Data Archiving and Networked Services (DANS)* DANS bevordert duurzame toegang tot digitale onderzoeksgegevens. Kijk op www.dans.knaw.nl voor meer informatie. DANS is een instituut van KNAW en NWO. Let op, per 1 januari hebben we een nieuw adres: DANS | Anna van Saksenlaan 51 | 2593 HW Den Haag | Postbus 93067 | 2509 AB Den Haag | +31 70 349 44 50 | info@dans.knaw.nl <info@dans.kn> | www.dans.knaw.nl *Let's build a World Wide Semantic Web!* http://worldwidesemanticweb.org/ *e-Humanities Group (KNAW)* [image: eHumanities] <http://www.ehumanities.nl/>
Received on Friday, 6 March 2015 13:23:34 UTC