Re: [Moderator Action] FW: Questions on quality/daQ for Data on the Web Best Practices: BP-1 & BP-2 from Phil Archer on 2015-06-19 (public-dwbp-comments@w3.org from June 2015)

From: Phil Archer <phila@w3.org>
Date: Fri, 19 Jun 2015 10:16:21 +0100
To: doug.rosenoff@thomsonreuters.com, bfl@cin.ufpe.br
CC: public-dwbp-comments@w3.org, Antoine Isaac <aisaac@few.vu.nl>
Message-ID: <5583DDE5.2010800@w3.org>

Re-sending to the public list.

This is useful feedback on DaQ from Thomson Reuters.

(Doug, sorry to be a pain, the list you sent it to is publicly readable 
but only writable by WG members). This list 
(public-dwbp-comments@w3.org) is open to all)

On 19/06/2015 09:53, doug.rosenoff@thomsonreuters.com wrote:
> All:
>
> My apologies for stepping in late to the discussion but I was hoping that the working group could consider some notions prior to promulgating the data activity best practices [3], and especially the work on quality.
>
> 1) Not everyone who will be using daQ/DQV [1] is a consumer or producer. Some other important roles are aggregator and curator. The current discussions don't seem to have been considered them much; their needs are different and frequently more complex than producer or consumer use cases. Perhaps these use cases might be brought forward more prominently and modeled a bit, if only to verify the completeness of the best practices document and data quality efforts.
>
> 2) I'm a bit unclear on whether the daQ/DQV [2] will be applied only at the dataset level or if it will allow for application at the dataset partition/entity/attribute/triple levels as well. This becomes an interesting problem when aggregating/curating data from multiple sources - overall data set quality may be the same or similar but the quality of some specific dimension may be off in one data set or the other, leading to a considerable different fitness for use in various contexts.
>
> 3) Another aggregator/curator set of problems has to do with best practices for multiple rendition datasets (same data, multiple output formats). For example, a news story may need to be translated into multiple languages but will retain the full set of semantic content; similarly a particular document, perhaps a court opinion, might need to be presented in the form of a web page and a PDF, again with completely identical semantics. Does the working group view that as an implementation set of issues or something that should have a best practice associated with it?
>
> 4) Another sample aggregator/curator use case that might be of concern to the working group is that of linked document sets (e.g. a binder full of data where in each chapter is linked to the next but is published independently at different times). Such documents arise frequently for aggregators, curators dealing with unstructured / semi-structured data. They present interesting and complex challenges for many aspects of data activity including version, renditions, and many others.
>
> 5) Finally, without meaning to set off too much a discussion, the advice given for URI indirection in the data activity discussion (bp 10 and 11) runs somewhat counter to the Good URIs make things human readable notions put forth elsewhere. While I understand the arguments both ways, it seems like there is a need for W3C to really be a bit more normative on what the expectation is for URIs in the semantic space. Is there agreement on which approach to take?
>
> I hope these issues are of interest and concern to the working group.
>
> Many thanks in advance for your consideration.
>
> D.
>
>
> [1]  http://w3c.github.io/dwbp/vocab-dqg.html
>
> [2]  https://www.w3.org/2013/dwbp/wiki/Data_Quality_Vocabulary_%28DQV%29
>
> [3]  https://w3c.github.io/dwbp/bp.html
>
> ---
> D. T. Rosenoff   :    doug.rosenoff@thomsonreuters.com   :   Thomson Reuters Information Architecture Strategy Group   :     @doug_r
>

-- 


Phil Archer
W3C Data Activity Lead
http://www.w3.org/2013/data/

http://philarcher.org
+44 (0)7887 767755
@philarcher1

Received on Friday, 19 June 2015 09:16:31 UTC