W3C home > Mailing lists > Public > public-dwbp-wg@w3.org > March 2016

Re: Review of BP on data re-use

From: Annette Greiner <amgreiner@lbl.gov>
Date: Tue, 22 Mar 2016 15:06:53 -0700
To: public-dwbp-wg@w3.org
Message-ID: <56F1C1FD.3090205@lbl.gov>
Responding to some of the comments on the data reuse BP.

Re the ideas from Europeana, the appropriateness of making sure that the 
license travels with the data depends on the license itself. Some 
specify that derivative works follow the same license or a compatible 
license. Getting that right is part of following the license 
requirements. I think Antoine's thought about keeping the data up to 
date is a good one, though that's covered in the original BP about 
data-up-to-date, since we say to update data when the source is updated. 
A small reminder in the reuse BP would seem fitting. The Europeana page 
also mentions the case where the reuser changes something about the 
data, saying that one should mention what was changed. I think that 
would be another idea worth mentioning.

Re whether it's in scope, this goes back to the original discussion 
about who our audience is. I would never have argued that our audience 
was specifically publishers if I didn't also believe that re-users, or 
re-publishers, are part of that group. Our charter charges us with 
"facilitating better communication between developers and publishers." 
We've recognized that developers are publishers, too, but we haven't 
addressed the original issue, which is really poor communication between 
original publishers and re-publishers. We haven't addressed anything 
that applies particularly to the challenge of re-publishing. In this BP, 
we finally do that. I feel that, if we were to leave it out, the list of 
BPs could leave publishers who are not also re-publishers feeling that 
they are the only ones tasked with improving their behavior. 
Communication is a two-way street, and I think addressing re-publishers 
is something we need to do to maintain balance. Having thought about 
this, I would not be comfortable publishing a BP list without these 
ideas in it. I am far, far less concerned about issues of scope than 
issues of balance and fairness, but I think this BP is firmly in scope 
and necessary.

Re the idea that we should split it into 2 BPs. The two-way split 
doesn't strike me as logical, because citing and providing feedback are 
two completely different tasks. I could possibly imagine splitting it 
into three BPs, as there are three components to reusing respectfully 
(at least currently). However, I'm not sure what we gain by splitting 
them up and putting them into other sections. That makes it difficult 
for users to find advice on what to do when they are reusing someone 
else's data. We could possibly split them into three and have all three 
in a new section, but I'm not sure there are really three unique 
BPs-worth of things to say about them. Either way, this is really a new 
challenge: how to reuse with consideration for the original publisher, 
and worthy of a new section.

It's an interesting idea to put the ideas about reuse into existing BPs, 
but I think that would force us to try and stuff two different ideas 
into each BP. We would have to find a partner BP for each one and 
rewrite. Supposing we felt it was worth that effort, we would end up 
with BPs that are trying awkwardly to encompass two different ideas. 
Keeping them separate helps understanding and keeps the BPs from 
becoming overloaded. It is one task to provide a channel for 
communication; it is quite another to use it, and it's still another to 
cite a source. Similarly, it is one task to provide a license; it is 
quite another to follow it.

On 3/22/16 9:25 AM, Laufer wrote:
> Hi All,
> I do not agree with a new section and a new BP about data reuse.
> I think that the aspects of reuse that are mentioned in the new BP are 
> covered by the BPs in our list: license, provenance and feedback.
> If someone wants to use, or reuse, data she has to think about theses 
> aspects and has to do what our BPs recommend.
> If the group think that these aspects should be highlighted, I think 
> that we can include these information in the original BPs.
> If we will talk about BPs for reuse we will need to see all the other 
> aspects of publications, as for example, how versioning will be 
> treated, how sensitive data will be treated, how the use of new 
> vocabularies will be compatible with the vocabularies used in the data 
> reused, and so on.
> I do not like the idea that reuse is not use. I think that in some 
> sense we are thinking that the only one that uses data is the final 
> user. But I think that the final user do not uses data. She asks a 
> question that someone that uses data will try to answer.
> All of our BPs include the benefit of Reuse. We do not even talk about 
> the benefit of Use.
> For me, our BPs cover the publishing of data that will be used. Or 
> reused, as you wish. I do not think we have to split in different BPs.
> Cheers, Laufer
> Bes
> ---
> .  .  .  .. .  .
> .        .   . ..
> .     ..       .
> Em 22/03/2016 12:50, Bernadette Farias Lóscio escreveu:
>> Hi all,
>> Considering that tomorrow we need to vote to include or not the BP 
>> about Data Re-use [1] on the BP document, I'd like to make some 
>> considerations.
>> I agree with Antoine that "a lot of the aspects of this BP are 
>> non-technical, so I'm not 100% sure it's in scope." However, I also 
>> like the idea of the BP and I'd like to make a proposal.
>> In my opinion, the Data Reuse BP should be splitted in two different 
>> BP: one for data licenses and another one for Citation and Feedback. 
>> We already have a section about data licenses, so I think It would be 
>> better to create a new BP considering the aspects mentioned by 
>> Antoine and Annette. If reusing is also a way of publishing data, 
>> then I think it won't be a problem.
>> The second BP will focus on providing citation and feedback. I also 
>> believe that are other aspects that should be considered. Annette's 
>> proposal mentions that publishers "should be made aware of any known 
>> problems with the data". However, feedback can be used to provide 
>> other informations about the dataset and not just to provide feedback 
>> about the problems. It is also really important to mention the 
>> Dataset Usage Vocabulary and to provide examples based on our own 
>> vocabulary.
>> In this case, we can also change the title of the section Feedback to 
>> be something like Feedback and Citation.
>> In summary, my proposal is:
>> - Split Data Reuse BP in two BP:
>> BP: Follow licensing constraints to be included in the Data Licenses 
>> Section
>> BP: Cite the original dataset and give feedback (this could also be 
>> splitted in two other BP: i)  BP Cite the original dataset and ii) 
>> Give feedback )
>> - Rename Feedback Section to Feedback and Citation.
>> Doing this, we also avoid the creation of a new section. Again, if 
>> reusing as way of publishing then I dont think that we should have a 
>> new section for this subject.
>> kind regards,
>> Berna
>> [1] http://agreiner.github.io/dwbp/bp.html#Re-use
>> 2016-03-16 9:19 GMT-03:00 Antoine Isaac <aisaac@few.vu.nl 
>> <mailto:aisaac@few.vu.nl>>:
>>     Hi everyone,
>>     I've just received the email with the editors asking for this:
>>         2. To review the Best Practice: Reuse vocabularies [3] ,
>>         which will be voted next Wednesday.
>>     This is excellent timing, I've just read it while catching up
>>     with the minutes of yesterday's session ;-)
>>     My feedback will be quick though (not much time to write a clean
>>     text!):
>>     1. a lot of the aspects of this BP are non-technical, so I'm not
>>     100% sure it's in scope. But there are some technical aspects
>>     involved, and see point #2.
>>     2. I do like the BP a lot. This makes a lot of sense
>>     3. my strong recommendation about licensing would be that
>>     re-users should make sure that any license or terms of use
>>     'travels' with the data. If reusers do something with the data,
>>     they make sure it's compatible with the license and terms of use.
>>     This includes (re-)publishing of data, or of derived data when
>>     applicable. Especially re-users of derived or re-published data
>>     must be aware of the original license and terms of use
>>     4. my organization (Europeana) has made terms of use that could
>>     be used as example. Our data is CC0, so there's no license
>>     whatsoever. But because attribution and provenance matter in our
>>     sector (culture) we wanted to encourage people to be 'respectful'.
>>     It's at http://www.europeana.eu/portal/rights/metadata.html
>>     I think it exemplifies quite a lot the aspects of Annette's BP
>>     proposal.
>>     5. the Europeana TOU include one technical aspect that could be
>>     strenghtened in the BP, imhp. Re-users should make sure they keep
>>     their data (or application) synchronization with the most
>>     up-to-date status of the original source. If someone builds and
>>     keeps something on the basis of old data, and let their own
>>     re-users think the original data source is responsible for
>>     problems of outdated data, this is not fair for the original data
>>     publisher.
>>     Cheers,
>>     Antoine
>>     [3] http://agreiner.github.io/dwbp/bp.html#Re-use
>> -- 
>> Bernadette Farias Lóscio
>> Centro de Informática
>> Universidade Federal de Pernambuco - UFPE, Brazil
>> ----------------------------------------------------------------------------

Annette Greiner
NERSC Data and Analytics Services
Lawrence Berkeley National Laboratory
Received on Tuesday, 22 March 2016 22:07:25 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 22 March 2016 22:07:26 UTC