audience for the BP doc from Annette Greiner on 2014-12-15 (public-dwbp-wg@w3.org from December 2014)

From: Annette Greiner <amgreiner@lbl.gov>
Date: Mon, 15 Dec 2014 13:08:32 -0800
To: DWBP Public List <public-dwbp-wg@w3.org>
Message-Id: <E3218A15-062E-4F17-8492-E3108EFC6EB5@lbl.gov>

Hi folks,
To pick up the discussion about our audience, I want to set down what I see as our audience for the current BP document. By audience I mean the people we expect to actually sit down and read it, not the people whose interests we need to consider in creating it (those are what I call stakeholders). It’s possible that we all agree but are just thinking of the terms differently.

To my mind, our audience includes anyone involved in making data available to consumers on the web. That is publishing data. It includes anyone who collects or collates the data, organizes the data, creates web pages or apps to share the data, re-publishes it in such a way that others can re-use it, or makes decisions relevant to how people do those tasks. They could be developers, lawyers, CIOs, researchers, archivists, designers, almost any job title. What matters, though, is not their job title but what actions they take with respect to the data. The action of consuming it is not what we have been discussing, it isn’t represented in any of the current best practices or in our scoping criteria, and it isn’t called for in the charter’s requirement to create a BP document. Thus far, we are not targeting our BPs to people who are *only* consuming the data and not republishing it.

I’ve already talked about the charter and the existing BPs in a previous email, so I’ll just address the scoping criteria here. The first one, being unique to publishing on the web, is obviously about publishing rather than consuming. The second one, encouraging reuse, is also about publishing, just in such a way that someone else can make use of the data. The charter mentions re-use in its mission in list item 2, which calls on us to "provide _guidance_to_publishers_ that will improve consistency in the way data is managed, thus promoting the re-use of data". If a consumer wants to publish something that makes the data truly re-usable, they must include the data itself, which means that they are publishing the data. The third criterion, testability, simply deals with the mechanics of making sure that one is successful in achieving the best practices.

It might help to consider an example: your organization publishes data about traffic in Rio. It's made available through an API. A data scientist in Lisbon is interested in the data and makes a visualization based on it that she posts on her blog. The data scientist does not make the data available in any form other than the visualization itself. She has not really enriched your data, because the original data still has no connection to the visualization. She cannot take action on any of the best practices we have identified thus far unless she re-publishes it herself, as data.

Your organization could link to the visualization, thereby enriching the data, but the data scientist in Lisbon cannot force it to do that. Our best practice around data enrichment calls on publishers to consider making that link or creating the visualization themselves. If we were writing that same best practice for a consumer audience, it would have to say something like "you should enrich other people's data". So, we would end up telling data enrichers that they should enrich data, which strikes me as tautological. One could go into detail about how to make good visualizations (use good labels, don’t rely on color alone, provide a zero point in your scales, etc.), but that seems to me out of scope. (I teach an entire semester course on visualization, so I could come up with lots of best practices about it, but I don't think we want to go there in the BP document we’ve been working on.)

Now suppose the consumer in Lisbon would like to provide feedback. If we, as the publisher, have not provided a mechanism for them to do so, they cannot provide it. Our best practice is about making it possible to provide feedback and then acting on the feedback to improve the published data. A consumer has a role here, but again, there is little point to telling a consumer who wants to give feedback that they should give feedback. I certainly wouldn’t expect a data consumer to wade through a long list of publisher-oriented best practices to be told that they should give feedback whenever they are so inclined.

I would support the idea of putting together a separate list of best practices for data consumers if we can think of a way to scope it that works.

-Annette

--
Annette Greiner
NERSC Data and Analytics Services
Lawrence Berkeley National Laboratory
510-495-2935

Received on Monday, 15 December 2014 21:09:07 UTC