Re: My additions today (was Re: Sensitive data text for enrichment section) from Bernadette Farias Lóscio on 2016-05-10 (public-dwbp-wg@w3.org from May 2016)

From: Bernadette Farias Lóscio <bfl@cin.ufpe.br>
Date: Tue, 10 May 2016 15:38:15 -0300
To: Annette Greiner <amgreiner@lbl.gov>
Cc: Phil Archer <phila@w3.org>, Eric Stephan <ericphb@gmail.com>, Public DWBP WG <public-dwbp-wg@w3.org>, Eric Kauz <eric.kauz@gs1.org>, Caroline Burle <cburle@nic.br>, Newton Calegari <newton@nic.br>
Message-ID: <CANx1PzzDD8weFmzv9fHeyTQc5jdMwV2A3oKXyOMgd_3FvczCAQ@mail.gmail.com>
Hi Phil, Annette and Eric,

Thanks a lot for the updates, feedback and suggestions!

Today I'm travelling, but tomorrow I'm gonna be back to Recife and I can
finish the remainder comments.

Cheers,
Berna



2016-05-10 15:04 GMT-03:00 Annette Greiner <amgreiner@lbl.gov>:

> +1 from me 2, BTW.
>
> -Annette
>
> On 5/10/16 8:35 AM, Phil Archer wrote:
>
> Thanks Eric, I've made those changes.
>
> @Editors, these are included in my current pull request.
>
> Phil
>
> On 09/05/2016 14:22, Eric Stephan wrote:
>
> Phil and Annette,
>
> I agree with the placement of the sensitive data in the introduction of
> the
> document.
>
> ~~~
>
> Could you change "Not all data" to "Not all data (and metadata)"?  If this
> were changed there would be no need for adding anything to the metadata
> section.
>
> ~~~
>
> Could you remove the statement:
>
> "It is for data publishers, not a technical standards working group, to
> determine policy on which data should be shared and under what
> circumstances. "
>
> To:
>
> "It is for data publishers to determine policy on which data should be
> shared and under what circumstances. "
>
> I understand the sentiment "not a technical standards working group" in
> the
> context of the DWBP , the tone of the statement just sounds a bit to
> universal.  Some domain specific (health care) technical standards for
> data
> sharing are constrained by policy and protocol.
>
> Thanks and great work,
>
> Eric S.
>
> On Mon, May 9, 2016 at 3:38 AM, Phil Archer <phila@w3.org> <phila@w3.org>
> wrote:
>
> Dear all,
>
> I have implemented a couple of minor changes to the BP doc that came up as
> a result of our discussions on Friday.
>
> 1. Thanks Eric K for the suggested text that allows us to include direct
> refs to the GS1 work which, IMHO, is well worth including. See versions of
> your text in
> http://philarcher1.github.io/dwbp/bp.html#identifiersWithinDatasets
>
> (For tracker, action-279)
>
> 2. Thanks Eric S for words on privacy - I think you'll agree that what
> Annette has suggested covers exactly what you were talking about?
>
> (For tracker: action-278)
>
> 3. Annette, thanks for reviewing and improving the suggestions about
> handling sensitive data - I agree entirely with your suggestions and, more
> importantly, I am confident that they reflect the wider discussions we had
> on Friday's call. Therefore I have put your improved text in my latest
> version of the doc. See
> http://philarcher1.github.io/dwbp/bp.html#intro
> and
> http://philarcher1.github.io/dwbp/bp.html#enrichment
>
> 4. Having done that, I have deleted the Sensitive Data section and moved
> the Data Unavailability to within the Access section, just before the sub
> section on APIs. I really wasn't sure where to put it but that seemed as
> good as anywhere?
>
> 5. I've made that sub group on BPs on APIs into a section so it becomes
> 8.10.1, complete with ID and all the rest of it.
>
> 6. I've updated the date of the doc to today's date.
>
> 7. Editors - I have issued a Pull Request for your consideration.
>
> HTH
>
> Phil.
>
>
>
> On 07/05/2016 21:05, Annette Greiner wrote:
>
> Hi Phil,
> Thanks for letting me weigh in. I understand the connection you’re making
> here, and I think it’s a good thing to mention in the enrichment section.
> What I think is crucial but is not yet reflected in here is the issue of
> privacy breach arising from putting together disparate data that presents
> less risk separately. The second paragraph here is a good but more general
> discussion of security and privacy issues that strikes me as not belonging
> in this particular section. I would suggest instead addressing the more
> general issues in the introduction to our document. Most of the third
> paragraph would also be better in the document introduction, but the last
> sentence is relevant here. As I see it, the real issue with data
> enrichment
> is combining datasets that each hold so little information about any
> individual that they cannot be identified but that together offer enough
> information that they can be. I would suggest that here we just say,
>
> Data enrichment refers to a set of processes that can be used to enhance,
>
> refine or otherwise improve raw or previously processed data. This idea
> and
> other similar concepts contribute to making data a valuable asset for
> almost any modern business or enterprise. It is a diverse topic in itself,
> details of which are beyond the scope of the current document. However, it
> is worth noting that some of these techniques should be approached with
> caution, as ethical concerns may arise. In scientific research, care must
> be taken to avoid enrichment that distorts results or statistical
> outcomes.
> For data about individuals, privacy issues may arise when combining
> datasets. That is, enriching one dataset with another, when neither
> contains sufficient information about any individual to identify them, may
> yield a combined dataset that compromises privacy. Furthermore, these
> techniqes can be carried out at scale, which in turn highlights the need
> for caution.
>
>
> Then, in the document introduction, I would suggest adding the following,
> after the paragraph that begins “In this context…”.
>
> Not all data should be shared openly, however. Security, commercial
>
> sensitivity and, above all, individuals' privacy need to be taken into
> account. It is for data publishers, not a technical standards working
> group, to determine policy on which data should be shared and under what
> circumstances. Data sharing policies are likely to assess the exposure
> risk
> and determine the appropriate security measures to be taken to protect
> sensitive data, such as secure authentication and authorization.
>
> Depending on circumstances, sensitive information about individuals
> might include full name, home address, email address, national
> identification number, IP address, vehicle registration plate number,
> driver's license number, face, fingerprints, or handwriting, credit card
> numbers, digital identity, date of birth, birthplace, genetic information,
> telephone number, login name, screen name, nickname, health records etc.
> Although it is likely to be safe to share some of that information openly,
> and even more within a controlled environment, publishers should bear in
> mind that combining data from multiple sources may allow inadvertent
> identification of individuals.
>
>
> (I took out mention of https, as it will soon be everywhere, which would
> make our doc out of date.)
>
> Also, I noticed a grammatical error in the implementation section of BP
> 31. (Subject-verb agreement is off.) It should read "Techniques for data
> enrichment are complex and go well beyond the scope of this document,
> which
> can only highlight the possibilities."
> -Annette
>
> On May 6, 2016, at 7:50 AM, Phil Archer <phila@w3.org> <phila@w3.org>
> wrote:
>
>
> Berna,
>
> As promised, I've copied the text from the sensitive data section and
> merged some of it with the data enrichment intro to end up with this as a
> suggestion.
>
> @Annette - we resolved to do this and move the BP about data
> unavailability to the data access section. Do you agree with this?
>
> ===Begins==
>
> Data enrichment refers to a set of processes that can be used to
> enhance, refine or otherwise improve raw or previously processed data.
> This
> idea and other similar concepts contribute to making data a valuable asset
> for almost any modern business or enterprise. It is a diverse topic in
> itself, details of which are beyond the scope of the current document.
> However, it is worth noting that techniques exist to carry out such
> enrichment at scale which in turn highlights the need for caution.
>
> Not all data should be shared openly. Security, commercial sensitivity
> and, above all, individuals' privacy need to be taken into account. It is
> for data publishers, not a technical standards working group, to determine
> policy on which data should be shared and under what circumstances. Data
> sharing policies are likely to assess the exposure risk and determine the
> appropriate security measures to be taken to protect sensitive data, such
> as secure authentication and use of HTTPS.
>
> Depending on circumstance, sensitive information about individuals might
> include: full name, home address, email address, national identification
> number, IP address, vehicle registration plate number, driver's license
> number, face, fingerprints, or handwriting, credit card numbers, digital
> identity, date of birth, birthplace, genetic information, telephone
> number,
> login name, screen name, nickname, health records etc. Although it is
> likely to be safe to share some of that information openly, and even more
> within a controlled environment, publishers should bear in mind that data
> enrichment techniques may allow some elements to be discovered and linked
> from elsewhere.
>
> Notwithstanding that caution, data enrichment offers exciting
> possibilities for both data publishers and consumers.
>
>
>
> == ends==
>
> --
>
>
> Phil Archer
> W3C Data Activity Lead
> http://www.w3.org/2013/data/
>
> http://philarcher.org
> +44 (0)7887 767755
> @philarcher1
>
>
>
>
> --
>
>
> Phil Archer
> W3C Data Activity Lead
> http://www.w3.org/2013/data/
>
> http://philarcher.org
> +44 (0)7887 767755
> @philarcher1
>
>
>
>
> --
> Annette Greiner
> NERSC Data and Analytics Services
> Lawrence Berkeley National Laboratory
>
>
>


-- 
Bernadette Farias Lóscio
Centro de Informática
Universidade Federal de Pernambuco - UFPE, Brazil
----------------------------------------------------------------------------
Received on Tuesday, 10 May 2016 18:46:41 UTC