W3C home > Mailing lists > Public > public-dwbp-wg@w3.org > March 2016

Re: updates to BP doc

From: Annette Greiner <amgreiner@lbl.gov>
Date: Mon, 14 Mar 2016 15:46:15 -0700
To: Caroline Burle <cburle@nic.br>, Phil Archer <phila@w3.org>, DWBP Public List <public-dwbp-wg@w3.org>
Message-ID: <56E73F37.8010608@lbl.gov>
Phil's suggestion sounds wise, and I think we should add it at the end 
of the Data Enrichment introduction. I would suggest that his line be 
slightly amended to explain what is out of scope. I don't think we want 
to say data enrichment itself is out of scope, because we are offering 
BPs about it.
"Data enrichment is a complex topic in its own right, and details of how 
to perform it are beyond the scope of this document."

Re changing the subtitle for 33, that re-introduces a problem that I 
realized while writing the BPs. We were mentioning enrichment as a way 
of adding metadata to a dataset. That is not actually what data 
enrichment accomplishes. Metadata can be extracted from a document 
corpus, but it is metadata for the documents that make it up, which 
becomes the actual data in a dataset about the documents. Metadata for 
the dataset as a whole is not something data enrichment is generally 
used for. I looked at multiple references about data enrichment, 
including the InWeb one, and none of them talks about generating 
metadata for the dataset itself. The InWeb document doesn't even mention 
the word metadata. Besides, metadata is data anyway.

General note on subtitles: we should make them consistently either 
passive or active voice. I prefer active voice, as we are telling people 
what we think they ought to *do*. That is, I prefer that we say "Do X" 
rather than "X should be done". It's simpler, shorter, and less prone to 
confusion with formal uses of "should", "must", etc.

For BP34, I think it's helpful to point out the intended outcome of 
users being able to understand the meaning of the dataset immediately, 
besides not having to create tools. Otherwise, one could argue that 
users could download the dataset and simply read it, which I think we 
want to discourage. But maybe there is some reason you want to remove it 
that we can address in a different way?

On 3/14/16 2:52 PM, Caroline Burle wrote:
> Hello!
> Thank you Annette, we indeed agree with Phil's comments.
> We may put this suggestion in the Data Enrichment Section Introduction 
> "this is a topic in its own right and is beyond the scope of the 
> current work" if everyone agrees.
> Regarding the Best Practice 33: Enrich data by generating new data, we 
> suggest to change the subtitle for something like "Datasets should be 
> enriched whenever possible, generating richer metadata and data when 
> doing so will enhance its value."
> In the Best Practice 34: Provide Complementary Presentations, the 
> Intended Outcome we suggest to take out the first phrase and let only 
> "Data consumers should not have to create their own tools to 
> understand the meaning of the data."
> Would that be okay?
> Thank you! Kind regards,
> Bernadette, Caroline and Newton
> On 13/03/16 12:46, Phil Archer wrote:
>> Thanks Annette,
>> I've just read through those enrichment BPs in your version of the 
>> doc and I like them all. The examples certainly help make it 
>> understandable. Your points about labelling inferred values is 
>> interesting and points to the complexity of the subject. Perhaps a 
>> line such as "this is a topic in its own right and is beyond the 
>> scope of the current work" might be appropriate. That would then 
>> provide a potential hook to the expected member submission on the topic.
>> I really like the last one about responsible use - a minor amendment 
>> would include a pointer to the DUV ;-)
>> Phil.
>> On 13/03/2016 09:13, Annette Greiner wrote:
>>> Hi folks,
>>> I've added the BPs about subsetting and enrichment and issued a pull
>>> request for those.
>>> I've also gotten to thinking that we ought to have something about the
>>> duties of reusers of data, as this is also a type of data 
>>> publishing. So
>>> I wrote this up as a possible drop-in BP. It's in a separate commit for
>>> ease of ignoring or merging.
>>> Happy trails!
>>> -Annette
>>>       7.15 Data Re-Use
>>> Re-using data is another way of publishing data. Data re-users have 
>>> some
>>> responsibilities that are unique to publishing on the web.This section
>>> provides advice to be followed by people who re-use data.
>>> *Reuse Data Respectfully*
>>> When reusing data, be considerate of the original publisher. Cite the
>>> original dataset; give feedback when you find problems; follow 
>>> licensing
>>> constraints.
>>> *Why*
>>> Publishers who make data available on the web deserve acknowledgment 
>>> for
>>> enabling others to work with it. Citation also maintains provenance and
>>> helps still others to work with the data. Providing feedback repays the
>>> publishers for their efforts and allows them to improve the dataset for
>>> future users. Following licensing constraints shows respect for the
>>> original publisher’s work and prevents legal entanglements.
>>> *Intended Outcome*
>>> Original publishers should be acknowledged by citation. They should be
>>> made aware of any known problems with the data. Datasets should not be
>>> used in violation of licensing agreements.
>>> *Possible Approach to Implementation*
>>> Provide a textual citation of the source in a readily visible area of
>>> the site in which it is used. Read the original license and be sure 
>>> that
>>> you provide any additional acknowledgments required. Follow the
>>> publisher’s directions for submitting feedback about a dataset if you
>>> find problems with it.
>>> *Example*: When publishing a visualization based on bus data from our
>>> fictional transit agency, the re-user could include the text “Data
>>> Source: MyCity Transport Agency” just beneath the graph and link the
>>> citation text back to the original source.
>>> *How to Test*
>>> Verify that the original publisher is cited in a readily discoverable
>>> place. Make sure that the licensing for the original permits the re-use
>>> to which it is being applied.
>>> *Evidence*
>>> Relevant requirements:: R-TrackDataUsages
>>> <http://www.w3.org/TR/dwbp-ucr/#R-TrackDataUsage>, R-UsageFeedback
>>> <http://www.w3.org/TR/dwbp-ucr/#R-UsageFeedback>, R-ProvAvailable
>>> <http://www.w3.org/TR/dwbp-ucr/#R-ProvAvailable>
>>> *Benefits*
>>> Reuse, Trust, Discoverability

Annette Greiner
NERSC Data and Analytics Services
Lawrence Berkeley National Laboratory
Received on Monday, 14 March 2016 22:46:49 UTC

This archive was generated by hypermail 2.3.1 : Monday, 14 March 2016 22:46:50 UTC