Re: audience for the BP doc

Eric, Annette, all,

To me, it would make sense if we concentrated on the audience of data
providers, at least for now. I think this is already a big order.

If we also want to cover best practices for the re-users of data
(developers, aggregators, mix-and-matchers, brokers, whatever you want to
call them), we’ll be spreading a scarce resource (ourselves) even thinner,
and run the risk of producing two sets of insufficient quality.

Let’s focus on the data providers first and then, when we have a good set
of best practices and still have time left, turn our attention to the
consumer side of the picture.

Makx.


2014-12-16 6:29 GMT+01:00 Eric Stephan <ericphb@gmail.com>:
>
> Thanks Annette for sharing your thoughts on this topic in the meeting last
> week and in this email.  In your text the term consumers really jumped out
> at me.  If consumers only has a read-only connotation then I'd rather avoid
> this term altogether.  Actually consumers was never actually never
> mentioned originally as part of the working group mission, instead the term
> "developer" was used.
>
> Developers to me, are technologists building applications and devices that
> reuse published data, including creating new data that can be published,
> processing and modifying published data, or strictly reading data in the
> life span of a running application. Users rely on the tools created by
> publishers and developers to edit published data and provide feedback.
> Publishers to me just focus on hosting and administering their data on the
> web in an orderly way.  Since the original intent of BP was to "facilitate
> better communication between developers and publishers.'  Maybe there
> should be best practices that target publishers and developers divided into
> two documents.
>
> The closest analogy is that off the shelf data storage systems two types
> of documentation are written:
> 1) Data administrators who manage the data system
> 2) End users (developers) who write applications that interact with the
> data system
>
> Thanks,
>
> Eric S
>
>
> On Mon, Dec 15, 2014 at 1:08 PM, Annette Greiner <amgreiner@lbl.gov>
> wrote:
>>
>> Hi folks,
>> To pick up the discussion about our audience, I want to set down what I
>> see as our audience for the current BP document. By audience I mean the
>> people we expect to actually sit down and read it, not the people whose
>> interests we need to consider in creating it (those are what I call
>> stakeholders). It’s possible that we all agree but are just thinking of the
>> terms differently.
>>
>> To my mind, our audience includes anyone involved in making data
>> available to consumers on the web. That is publishing data. It includes
>> anyone who collects or collates the data, organizes the data, creates web
>> pages or apps to share the data, re-publishes it in such a way that others
>> can re-use it, or makes decisions relevant to how people do those tasks.
>> They could be developers, lawyers, CIOs, researchers, archivists,
>> designers, almost any job title. What matters, though, is not their job
>> title but what actions they take with respect to the data. The action of
>> consuming it is not what we have been discussing, it isn’t represented in
>> any of the current best practices or in our scoping criteria, and it isn’t
>> called for in the charter’s requirement to create a BP document. Thus far,
>> we are not targeting our BPs to people who are *only* consuming the data
>> and not republishing it.
>>
>> I’ve already talked about the charter and the existing BPs in a previous
>> email, so I’ll just address the scoping criteria here. The first one, being
>> unique to publishing on the web, is obviously about publishing rather than
>> consuming. The second one, encouraging reuse, is also about publishing,
>> just in such a way that someone else can make use of the data. The charter
>> mentions re-use in its mission in list item 2, which calls on us to
>> "provide _guidance_to_publishers_ that will improve consistency in the way
>> data is managed, thus promoting the re-use of data". If a consumer wants to
>> publish something that makes the data truly re-usable, they must include
>> the data itself, which means that they are publishing the data. The third
>> criterion, testability, simply deals with the mechanics of making sure that
>> one is successful in achieving the best practices.
>>
>> It might help to consider an example: your organization publishes data
>> about traffic in Rio. It's made available through an API. A data scientist
>> in Lisbon is interested in the data and makes a visualization based on it
>> that she posts on her blog. The data scientist does not make the data
>> available in any form other than the visualization itself. She has not
>> really enriched your data, because the original data still has no
>> connection to the visualization. She cannot take action on any of the best
>> practices we have identified thus far unless she re-publishes it herself,
>> as data.
>>
>> Your organization could link to the visualization, thereby enriching the
>> data, but the data scientist in Lisbon cannot force it to do that. Our best
>> practice around data enrichment calls on publishers to consider making that
>> link or creating the visualization themselves. If we were writing that same
>> best practice for a consumer audience, it would have to say something like
>> "you should enrich other people's data". So, we would end up telling data
>> enrichers that they should enrich data, which strikes me as tautological.
>> One could go into detail about how to make good visualizations (use good
>> labels, don’t rely on color alone, provide a zero point in your scales,
>> etc.), but that seems to me out of scope. (I teach an entire semester
>> course on visualization, so I could come up with lots of best practices
>> about it, but I don't think we want to go there in the BP document we’ve
>> been working on.)
>>
>> Now suppose the consumer in Lisbon would like to provide feedback. If we,
>> as the publisher, have not provided a mechanism for them to do so, they
>> cannot provide it. Our best practice is about making it possible to provide
>> feedback and then acting on the feedback to improve the published data. A
>> consumer has a role here, but again, there is little point to telling a
>> consumer who wants to give feedback that they should give feedback. I
>> certainly wouldn’t expect a data consumer to wade through a long list of
>> publisher-oriented best practices to be told that they should give feedback
>> whenever they are so inclined.
>>
>> I would support the idea of putting together a separate list of best
>> practices for data consumers if we can think of a way to scope it that
>> works.
>>
>> -Annette
>>
>>
>> --
>> Annette Greiner
>> NERSC Data and Analytics Services
>> Lawrence Berkeley National Laboratory
>> 510-495-2935
>>
>>
>>

-- 
--------------------------------------------------------------------------------
Makx Dekkers
mail@makxdekkers.com
--------------------------------------------------------------------------------

Received on Tuesday, 16 December 2014 11:37:12 UTC