- From: Annette Greiner <amgreiner@lbl.gov>
- Date: Fri, 12 Jun 2015 09:04:14 -0700
- To: Christophe Guéret <christophe.gueret@dans.knaw.nl>
- Cc: Bernadette Farias Lóscio <bfl@cin.ufpe.br>, Data on the Web Best Practices Working Group <public-dwbp-wg@w3.org>
I think the OCR issue is a data quality issue. On Jun 11, 2015, at 11:05 PM, Christophe Guéret <christophe.gueret@dans.knaw.nl> wrote: > Hi Annette, Bernadette, > > I proposed to change because IMO using "should" would be strong enough, but I understand your point! In this case, I propose to keep "must" instead of "should" and then we postpone this discussion for later when we discuss the proposal of maturity levels for BP. > > @Christophe, are you ok with this? > If we have only "should" BPs then that will make it for a rather weak set of recommendations, so let's keep "must" at least for this core point ;-) > > But I still understand the point people (don't remember who it was) wanted to make. Is a scanned image machine readable ? We already had that discussion on the readable aspect some time ago... > Keep a "must" on machine readable we have to ensure that, eg, archives that expose scanned images can still do it in a way that complies with what the document recommends. It wouldn't be good if we imply that all the scans must be OCRed in order to become machine readable. > > Christophe > > > > Thanks! > Bernadette > > 2015-06-11 15:37 GMT-03:00 Annette Greiner <amgreiner@lbl.gov>: > I disagree with this change. I imagine it will get re-assessed as we consider moving to a BP document that provides levels of compliance, but I must say that I find making the data machine readable an extremely low bar for calling something compliant with any sort of best practice for web publication. I would not happily vote to publish a spec that has this only as a should. I suspect that there is some confusion here about whether our document affects the ability of users to publish data online. We should be clear that we are not going to alter the ability of individuals to publish data in any particular form. If they want to publish data quickly and without meeting all the requirements for compliance with the BP document, they can still do that; they just can’t claim that they have published in accordance with our criteria. > -Annette > -- > Annette Greiner > NERSC Data and Analytics Services > Lawrence Berkeley National Laboratory > 510-495-2935 > > On Jun 11, 2015, at 7:08 AM, Bernadette Farias Lóscio <bfl@cin.ufpe.br> wrote: > >> Hello Christophe, >> >> Thanks a lot for your comments on the FPWD of the DWBP document! After gathering some feedback from the community some changes were made and we're planning to publish a 2nd draft [1]. >> >> In the following, you can find some comments about your feedback on the FPWD. >> >> >> # Overall points >> The document concerns more data publishers than it concerns consumers. This also seems to be reflected by the composition of editors/contributors, there should be more data consumers jumping in and adding BPs that matter to them. >> "Data must be available in machine readable" -> only should, must is way too strong. Some data consumers may want to have access to data that is not machine readable (e.g. scanned old document) and not being only restricted to their machine-translated counterparts (e.g. OCRed old document) >> >> During the discussions about the audience, the group agreed that publishers will be our primary audience. In this case, best practices should be employed by data publishers instead of data consumers. However, both publishers and consumers will benefit from this. Then, I suggest to keep publishers as the main primary audience for our BP. >> >> Concerning the "Data must be available in machine readable", It was changed for "should". >> > > > > > -- > Bernadette Farias Lóscio > Centro de Informática > Universidade Federal de Pernambuco - UFPE, Brazil > ---------------------------------------------------------------------------- > > > > -- > Onderzoeker > DANS, Anna van Saksenlaan 51, 2593 HW Den Haag > +31(0)6 14576494 > christophe.gueret@dans.knaw.nl > > Data Archiving and Networked Services (DANS/KNAW) > > > e-Humanities Group (KNAW) > > > World Wide Semantic Web community > http://worldwidesemanticweb.org/ > >
Received on Friday, 12 June 2015 16:05:28 UTC