Re: My additions today (was Re: Sensitive data text for enrichment section) from Eric Stephan on 2016-05-09 (public-dwbp-wg@w3.org from May 2016)

From: Eric Stephan <ericphb@gmail.com>
Date: Mon, 9 May 2016 06:22:21 -0700
To: Phil Archer <phila@w3.org>
Cc: Annette Greiner <amgreiner@lbl.gov>, Public DWBP WG <public-dwbp-wg@w3.org>, Bernadette Farias Lóscio <bfl@cin.ufpe.br>, Eric Kauz <eric.kauz@gs1.org>, Caroline Burle <cburle@nic.br>, Newton Calegari <newton@nic.br>
Message-ID: <CAMFz4jj5p8FhWrjyN46iQPRJqFEPTZAdfgcTaZEuh3XQUmzsqQ@mail.gmail.com>
Phil and Annette,

I agree with the placement of the sensitive data in the introduction of the
document.

~~~

Could you change "Not all data" to "Not all data (and metadata)"?  If this
were changed there would be no need for adding anything to the metadata
section.

~~~

Could you remove the statement:

"It is for data publishers, not a technical standards working group, to
determine policy on which data should be shared and under what
circumstances. "

To:

"It is for data publishers to determine policy on which data should be
shared and under what circumstances. "

I understand the sentiment "not a technical standards working group" in the
context of the DWBP , the tone of the statement just sounds a bit to
universal.  Some domain specific (health care) technical standards for data
sharing are constrained by policy and protocol.

Thanks and great work,

Eric S.

On Mon, May 9, 2016 at 3:38 AM, Phil Archer <phila@w3.org> wrote:

> Dear all,
>
> I have implemented a couple of minor changes to the BP doc that came up as
> a result of our discussions on Friday.
>
> 1. Thanks Eric K for the suggested text that allows us to include direct
> refs to the GS1 work which, IMHO, is well worth including. See versions of
> your text in
> http://philarcher1.github.io/dwbp/bp.html#identifiersWithinDatasets
>
> (For tracker, action-279)
>
> 2. Thanks Eric S for words on privacy - I think you'll agree that what
> Annette has suggested covers exactly what you were talking about?
>
> (For tracker: action-278)
>
> 3. Annette, thanks for reviewing and improving the suggestions about
> handling sensitive data - I agree entirely with your suggestions and, more
> importantly, I am confident that they reflect the wider discussions we had
> on Friday's call. Therefore I have put your improved text in my latest
> version of the doc. See
> http://philarcher1.github.io/dwbp/bp.html#intro
> and
> http://philarcher1.github.io/dwbp/bp.html#enrichment
>
> 4. Having done that, I have deleted the Sensitive Data section and moved
> the Data Unavailability to within the Access section, just before the sub
> section on APIs. I really wasn't sure where to put it but that seemed as
> good as anywhere?
>
> 5. I've made that sub group on BPs on APIs into a section so it becomes
> 8.10.1, complete with ID and all the rest of it.
>
> 6. I've updated the date of the doc to today's date.
>
> 7. Editors - I have issued a Pull Request for your consideration.
>
> HTH
>
> Phil.
>
>
>
> On 07/05/2016 21:05, Annette Greiner wrote:
>
>> Hi Phil,
>> Thanks for letting me weigh in. I understand the connection you’re making
>> here, and I think it’s a good thing to mention in the enrichment section.
>> What I think is crucial but is not yet reflected in here is the issue of
>> privacy breach arising from putting together disparate data that presents
>> less risk separately. The second paragraph here is a good but more general
>> discussion of security and privacy issues that strikes me as not belonging
>> in this particular section. I would suggest instead addressing the more
>> general issues in the introduction to our document. Most of the third
>> paragraph would also be better in the document introduction, but the last
>> sentence is relevant here. As I see it, the real issue with data enrichment
>> is combining datasets that each hold so little information about any
>> individual that they cannot be identified but that together offer enough
>> information that they can be. I would suggest that here we just say,
>>
>> Data enrichment refers to a set of processes that can be used to enhance,
>>> refine or otherwise improve raw or previously processed data. This idea and
>>> other similar concepts contribute to making data a valuable asset for
>>> almost any modern business or enterprise. It is a diverse topic in itself,
>>> details of which are beyond the scope of the current document. However, it
>>> is worth noting that some of these techniques should be approached with
>>> caution, as ethical concerns may arise. In scientific research, care must
>>> be taken to avoid enrichment that distorts results or statistical outcomes.
>>> For data about individuals, privacy issues may arise when combining
>>> datasets. That is, enriching one dataset with another, when neither
>>> contains sufficient information about any individual to identify them, may
>>> yield a combined dataset that compromises privacy. Furthermore, these
>>> techniqes can be carried out at scale, which in turn highlights the need
>>> for caution.
>>>
>>
>> Then, in the document introduction, I would suggest adding the following,
>> after the paragraph that begins “In this context…”.
>>
>> Not all data should be shared openly, however. Security, commercial
>>> sensitivity and, above all, individuals' privacy need to be taken into
>>> account. It is for data publishers, not a technical standards working
>>> group, to determine policy on which data should be shared and under what
>>> circumstances. Data sharing policies are likely to assess the exposure risk
>>> and determine the appropriate security measures to be taken to protect
>>> sensitive data, such as secure authentication and authorization.
>>>
>>> Depending on circumstances, sensitive information about individuals
>>> might include full name, home address, email address, national
>>> identification number, IP address, vehicle registration plate number,
>>> driver's license number, face, fingerprints, or handwriting, credit card
>>> numbers, digital identity, date of birth, birthplace, genetic information,
>>> telephone number, login name, screen name, nickname, health records etc.
>>> Although it is likely to be safe to share some of that information openly,
>>> and even more within a controlled environment, publishers should bear in
>>> mind that combining data from multiple sources may allow inadvertent
>>> identification of individuals.
>>>
>>
>> (I took out mention of https, as it will soon be everywhere, which would
>> make our doc out of date.)
>>
>> Also, I noticed a grammatical error in the implementation section of BP
>> 31. (Subject-verb agreement is off.) It should read "Techniques for data
>> enrichment are complex and go well beyond the scope of this document, which
>> can only highlight the possibilities."
>> -Annette
>>
>> On May 6, 2016, at 7:50 AM, Phil Archer <phila@w3.org> wrote:
>>>
>>> Berna,
>>>
>>> As promised, I've copied the text from the sensitive data section and
>>> merged some of it with the data enrichment intro to end up with this as a
>>> suggestion.
>>>
>>> @Annette - we resolved to do this and move the BP about data
>>> unavailability to the data access section. Do you agree with this?
>>>
>>> ===Begins==
>>>
>>> Data enrichment refers to a set of processes that can be used to
>>> enhance, refine or otherwise improve raw or previously processed data. This
>>> idea and other similar concepts contribute to making data a valuable asset
>>> for almost any modern business or enterprise. It is a diverse topic in
>>> itself, details of which are beyond the scope of the current document.
>>> However, it is worth noting that techniques exist to carry out such
>>> enrichment at scale which in turn highlights the need for caution.
>>>
>>> Not all data should be shared openly. Security, commercial sensitivity
>>> and, above all, individuals' privacy need to be taken into account. It is
>>> for data publishers, not a technical standards working group, to determine
>>> policy on which data should be shared and under what circumstances. Data
>>> sharing policies are likely to assess the exposure risk and determine the
>>> appropriate security measures to be taken to protect sensitive data, such
>>> as secure authentication and use of HTTPS.
>>>
>>> Depending on circumstance, sensitive information about individuals might
>>> include: full name, home address, email address, national identification
>>> number, IP address, vehicle registration plate number, driver's license
>>> number, face, fingerprints, or handwriting, credit card numbers, digital
>>> identity, date of birth, birthplace, genetic information, telephone number,
>>> login name, screen name, nickname, health records etc. Although it is
>>> likely to be safe to share some of that information openly, and even more
>>> within a controlled environment, publishers should bear in mind that data
>>> enrichment techniques may allow some elements to be discovered and linked
>>> from elsewhere.
>>>
>>> Notwithstanding that caution, data enrichment offers exciting
>>> possibilities for both data publishers and consumers.
>>>
>>>
>>>
>>> == ends==
>>>
>>> --
>>>
>>>
>>> Phil Archer
>>> W3C Data Activity Lead
>>> http://www.w3.org/2013/data/
>>>
>>> http://philarcher.org
>>> +44 (0)7887 767755
>>> @philarcher1
>>>
>>
>>
>>
> --
>
>
> Phil Archer
> W3C Data Activity Lead
> http://www.w3.org/2013/data/
>
> http://philarcher.org
> +44 (0)7887 767755
> @philarcher1
>
Received on Monday, 9 May 2016 13:27:56 UTC