W3C home > Mailing lists > Public > public-dwbp-wg@w3.org > March 2016

Re: Review of BP on data re-use

From: Bernadette Farias Lóscio <bfl@cin.ufpe.br>
Date: Wed, 23 Mar 2016 10:35:52 -0300
Message-ID: <CANx1PzxHc47sx0N3__CG-F1cMnpMLYCani1-uYqrV713qg2zAw@mail.gmail.com>
To: Deirdre Lee <deirdre@derilinx.com>
Cc: "public-dwbp-wg@w3.org" <public-dwbp-wg@w3.org>
Hi all,

Thanks a lot for the good discussion!

If we consider that reuse is a way of publishing data, i.e., data is being
consumed to be published again, then our BP cover this case. However, we
should have additional BP, as Annette proposed: cite the original dataset,
provide feedback and respect the data license. These three BP apply when
the publisher is consuming existing data. But, this is also about
publishing data.

We already have sections to cover these 03 aspects:

- Data Provenance is about providing information about the origin or
history of the dataset. In this case, if the dataset originates from
another dataset then it is important to cite the original dataset (using
the DUV);

- Feedback is about how to gather feedback and how to provide information
about feedback. Then, when someone is publishing a dataset using an
existing dataset, then feedback should be provided;

- Data License is about providing data license information. In addition to
this, when the publishing concerns existing data, then the original license
should be respected.

Cheers,
Berna



2016-03-23 8:15 GMT-03:00 Deirdre Lee <deirdre@derilinx.com>:

> Hi,
>
> I'm afraid I won't be on the call today, but in terms of BP on re-use:
>
> Overall, I don't think there should be a new section for reuse.
>
> The only way this BP makes sense to me, is if it is presented in the sense
> of  published data being part of a wider data lifecycle,
> publish-use-republish-reuse etc.
> This may fit into data enrichment section:
> 'Data enrichment refers to a set of processes that can be used to enhance,
> refine or otherwise improve raw or previously processed data.'
>
> My suggestion: If we keep the BP, it should sit in the data enrichment
> section, and content updated to emphasise data lifecycle.
>
> I'm sure it'll be an interesting call today!
> Cheers,
> Derdre
>
>
>
> On 22/03/2016 22:06, Annette Greiner wrote:
>
> Responding to some of the comments on the data reuse BP.
>
> Re the ideas from Europeana, the appropriateness of making sure that the
> license travels with the data depends on the license itself. Some specify
> that derivative works follow the same license or a compatible license.
> Getting that right is part of following the license requirements. I think
> Antoine's thought about keeping the data up to date is a good one, though
> that's covered in the original BP about data-up-to-date, since we say to
> update data when the source is updated. A small reminder in the reuse BP
> would seem fitting. The Europeana page also mentions the case where the
> reuser changes something about the data, saying that one should mention
> what was changed. I think that would be another idea worth mentioning.
>
> Re whether it's in scope, this goes back to the original discussion about
> who our audience is. I would never have argued that our audience was
> specifically publishers if I didn't also believe that re-users, or
> re-publishers, are part of that group. Our charter charges us with
> "facilitating better communication between developers and publishers."
> We've recognized that developers are publishers, too, but we haven't
> addressed the original issue, which is really poor communication between
> original publishers and re-publishers. We haven't addressed anything that
> applies particularly to the challenge of re-publishing. In this BP, we
> finally do that. I feel that, if we were to leave it out, the list of BPs
> could leave publishers who are not also re-publishers feeling that they are
> the only ones tasked with improving their behavior. Communication is a
> two-way street, and I think addressing re-publishers is something we need
> to do to maintain balance. Having thought about this, I would not be
> comfortable publishing a BP list without these ideas in it. I am far, far
> less concerned about issues of scope than issues of balance and fairness,
> but I think this BP is firmly in scope and necessary.
>
> Re the idea that we should split it into 2 BPs. The two-way split doesn't
> strike me as logical, because citing and providing feedback are two
> completely different tasks. I could possibly imagine splitting it into
> three BPs, as there are three components to reusing respectfully (at least
> currently). However, I'm not sure what we gain by splitting them up and
> putting them into other sections. That makes it difficult for users to find
> advice on what to do when they are reusing someone else's data. We could
> possibly split them into three and have all three in a new section, but I'm
> not sure there are really three unique BPs-worth of things to say about
> them. Either way, this is really a new challenge: how to reuse with
> consideration for the original publisher, and worthy of a new section.
>
> It's an interesting idea to put the ideas about reuse into existing BPs,
> but I think that would force us to try and stuff two different ideas into
> each BP. We would have to find a partner BP for each one and rewrite.
> Supposing we felt it was worth that effort, we would end up with BPs that
> are trying awkwardly to encompass two different ideas. Keeping them
> separate helps understanding and keeps the BPs from becoming overloaded. It
> is one task to provide a channel for communication; it is quite another to
> use it, and it's still another to cite a source. Similarly, it is one task
> to provide a license; it is quite another to follow it.
> -Annette
>
> On 3/22/16 9:25 AM, Laufer wrote:
>
>
> Hi All,
>
> I do not agree with a new section and a new BP about data reuse.
>
> I think that the aspects of reuse that are mentioned in the new BP are
> covered by the BPs in our list: license, provenance and feedback.
>
> If someone wants to use, or reuse, data she has to think about theses
> aspects and has to do what our BPs recommend.
>
> If the group think that these aspects should be highlighted, I think that
> we can include these information in the original BPs.
>
> If we will talk about BPs for reuse we will need to see all the other
> aspects of publications, as for example, how versioning will be treated,
> how sensitive data will be treated, how the use of new vocabularies will be
> compatible with the vocabularies used in the data reused, and so on.
>
> I do not like the idea that reuse is not use. I think that in some sense
> we are thinking that the only one that uses data is the final user. But I
> think that the final user do not uses data. She asks a question that
> someone that uses data will try to answer.
>
> All of our BPs include the benefit of Reuse. We do not even talk about the
> benefit of Use.
>
> For me, our BPs cover the publishing of data that will be used. Or reused,
> as you wish. I do not think we have to split in different BPs.
>
> Cheers, Laufer
>
> Bes
>
>
>
>
>
>
>
> ---
>
> .  .  .  .. .  .
> .        .   . ..
> .     ..       .
>
>
>
> Em 22/03/2016 12:50, Bernadette Farias Lóscio escreveu:
>
> Hi all,
>
> Considering that tomorrow we need to vote to include or not the BP about
> Data Re-use [1] on the BP document, I'd like to make some considerations.
>
> I agree with Antoine that "a lot of the aspects of this BP are
> non-technical, so I'm not 100% sure it's in scope." However, I also like
> the idea of the BP and I'd like to make a proposal.
>
> In my opinion, the Data Reuse BP should be splitted in two different BP:
> one for data licenses and another one for Citation and Feedback. We already
> have a section about data licenses, so I think It would be better to create
> a new BP considering the aspects mentioned by Antoine and Annette. If
> reusing is also a way of publishing data, then I think it won't be a
> problem.
>
> The second BP will focus on providing citation and feedback. I also
> believe that are other aspects that should be considered. Annette's
> proposal mentions that publishers "should be made aware of any known
> problems with the data". However, feedback can be used to provide other
> informations about the dataset and not just to provide feedback about the
> problems. It is also really important to mention the Dataset Usage
> Vocabulary and to provide examples based on our own vocabulary.
>
> In this case, we can also change the title of the section Feedback to be
> something like Feedback and Citation.
>
> In summary, my proposal is:
>
> - Split Data Reuse BP in two BP:
> BP: Follow licensing constraints to be included in the Data Licenses
> Section
> BP: Cite the original dataset and give feedback (this could also be
> splitted in two other BP: i)  BP Cite the original dataset and ii) Give
> feedback )
>
> - Rename Feedback Section to Feedback and Citation.
>
> Doing this, we also avoid the creation of a new section. Again, if reusing
> as way of publishing then I dont think that we should have a new section
> for this subject.
>
> kind regards,
> Berna
>
> [1] http://agreiner.github.io/dwbp/bp.html#Re-use
>
> 2016-03-16 9:19 GMT-03:00 Antoine Isaac <aisaac@few.vu.nl>:
>
>> Hi everyone,
>>
>> I've just received the email with the editors asking for this:
>>
>>
>> 2. To review the Best Practice: Reuse vocabularies [3] , which will be
>>> voted next Wednesday.
>>
>>
>>
>> This is excellent timing, I've just read it while catching up with the
>> minutes of yesterday's session ;-)
>>
>> My feedback will be quick though (not much time to write a clean text!):
>>
>> 1. a lot of the aspects of this BP are non-technical, so I'm not 100%
>> sure it's in scope. But there are some technical aspects involved, and see
>> point #2.
>>
>> 2. I do like the BP a lot. This makes a lot of sense
>>
>> 3. my strong recommendation about licensing would be that re-users should
>> make sure that any license or terms of use 'travels' with the data. If
>> reusers do something with the data, they make sure it's compatible with the
>> license and terms of use. This includes (re-)publishing of data, or of
>> derived data when applicable. Especially re-users of derived or
>> re-published data must be aware of the original license and terms of use
>>
>> 4. my organization (Europeana) has made terms of use that could be used
>> as example. Our data is CC0, so there's no license whatsoever. But because
>> attribution and provenance matter in our sector (culture) we wanted to
>> encourage people to be 'respectful'.
>> It's at http://www.europeana.eu/portal/rights/metadata.html
>> I think it exemplifies quite a lot the aspects of Annette's BP proposal.
>>
>> 5. the Europeana TOU include one technical aspect that could be
>> strenghtened in the BP, imhp. Re-users should make sure they keep their
>> data (or application) synchronization with the most up-to-date status of
>> the original source. If someone builds and keeps something on the basis of
>> old data, and let their own re-users think the original data source is
>> responsible for problems of outdated data, this is not fair for the
>> original data publisher.
>>
>> Cheers,
>>
>> Antoine
>>
>> [3] http://agreiner.github.io/dwbp/bp.html#Re-use
>>
>>
>
>
> --
> Bernadette Farias Lóscio
> Centro de Informática
> Universidade Federal de Pernambuco - UFPE, Brazil
>
> ----------------------------------------------------------------------------
>
>
> --
> Annette Greiner
> NERSC Data and Analytics Services
> Lawrence Berkeley National Laboratory
>
>
>
> --
> ------------------------------------
> Deirdre Lee, CEO & Founder
> Derilinx - Linked & Open Data Solutions
>
> Web:      www.derilinx.com
> Email:    deirdre@derilinx.com
> Address:  11/12 Baggot Court, Dublin 2, D02 F891
> Tel:      +353 (0)1 254 4316
> Mob:      +353 (0)87 417 2318
> Linkedin: ie.linkedin.com/in/leedeirdre/
> Twitter:  @deirdrelee
>
>


-- 
Bernadette Farias Lóscio
Centro de Informática
Universidade Federal de Pernambuco - UFPE, Brazil
----------------------------------------------------------------------------
Received on Wednesday, 23 March 2016 13:36:46 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 23 March 2016 13:36:47 UTC