Re: My additions today (was Re: Sensitive data text for enrichment section) from Phil Archer on 2016-05-10 (public-dwbp-wg@w3.org from May 2016)

From: Phil Archer <phila@w3.org>
Date: Tue, 10 May 2016 16:35:46 +0100
To: Eric Stephan <ericphb@gmail.com>
Cc: Annette Greiner <amgreiner@lbl.gov>, Public DWBP WG <public-dwbp-wg@w3.org>, Bernadette Farias Lóscio <bfl@cin.ufpe.br>, Eric Kauz <eric.kauz@gs1.org>, Caroline Burle <cburle@nic.br>, Newton Calegari <newton@nic.br>
Message-ID: <a54efd44-1a7c-e983-40a4-c338e4d04a75@w3.org>
Thanks Eric, I've made those changes.

@Editors, these are included in my current pull request.

Phil

On 09/05/2016 14:22, Eric Stephan wrote:
> Phil and Annette,
>
> I agree with the placement of the sensitive data in the introduction of the
> document.
>
> ~~~
>
> Could you change "Not all data" to "Not all data (and metadata)"?  If this
> were changed there would be no need for adding anything to the metadata
> section.
>
> ~~~
>
> Could you remove the statement:
>
> "It is for data publishers, not a technical standards working group, to
> determine policy on which data should be shared and under what
> circumstances. "
>
> To:
>
> "It is for data publishers to determine policy on which data should be
> shared and under what circumstances. "
>
> I understand the sentiment "not a technical standards working group" in the
> context of the DWBP , the tone of the statement just sounds a bit to
> universal.  Some domain specific (health care) technical standards for data
> sharing are constrained by policy and protocol.
>
> Thanks and great work,
>
> Eric S.
>
> On Mon, May 9, 2016 at 3:38 AM, Phil Archer <phila@w3.org> wrote:
>
>> Dear all,
>>
>> I have implemented a couple of minor changes to the BP doc that came up as
>> a result of our discussions on Friday.
>>
>> 1. Thanks Eric K for the suggested text that allows us to include direct
>> refs to the GS1 work which, IMHO, is well worth including. See versions of
>> your text in
>> http://philarcher1.github.io/dwbp/bp.html#identifiersWithinDatasets
>>
>> (For tracker, action-279)
>>
>> 2. Thanks Eric S for words on privacy - I think you'll agree that what
>> Annette has suggested covers exactly what you were talking about?
>>
>> (For tracker: action-278)
>>
>> 3. Annette, thanks for reviewing and improving the suggestions about
>> handling sensitive data - I agree entirely with your suggestions and, more
>> importantly, I am confident that they reflect the wider discussions we had
>> on Friday's call. Therefore I have put your improved text in my latest
>> version of the doc. See
>> http://philarcher1.github.io/dwbp/bp.html#intro
>> and
>> http://philarcher1.github.io/dwbp/bp.html#enrichment
>>
>> 4. Having done that, I have deleted the Sensitive Data section and moved
>> the Data Unavailability to within the Access section, just before the sub
>> section on APIs. I really wasn't sure where to put it but that seemed as
>> good as anywhere?
>>
>> 5. I've made that sub group on BPs on APIs into a section so it becomes
>> 8.10.1, complete with ID and all the rest of it.
>>
>> 6. I've updated the date of the doc to today's date.
>>
>> 7. Editors - I have issued a Pull Request for your consideration.
>>
>> HTH
>>
>> Phil.
>>
>>
>>
>> On 07/05/2016 21:05, Annette Greiner wrote:
>>
>>> Hi Phil,
>>> Thanks for letting me weigh in. I understand the connection you’re making
>>> here, and I think it’s a good thing to mention in the enrichment section.
>>> What I think is crucial but is not yet reflected in here is the issue of
>>> privacy breach arising from putting together disparate data that presents
>>> less risk separately. The second paragraph here is a good but more general
>>> discussion of security and privacy issues that strikes me as not belonging
>>> in this particular section. I would suggest instead addressing the more
>>> general issues in the introduction to our document. Most of the third
>>> paragraph would also be better in the document introduction, but the last
>>> sentence is relevant here. As I see it, the real issue with data enrichment
>>> is combining datasets that each hold so little information about any
>>> individual that they cannot be identified but that together offer enough
>>> information that they can be. I would suggest that here we just say,
>>>
>>> Data enrichment refers to a set of processes that can be used to enhance,
>>>> refine or otherwise improve raw or previously processed data. This idea and
>>>> other similar concepts contribute to making data a valuable asset for
>>>> almost any modern business or enterprise. It is a diverse topic in itself,
>>>> details of which are beyond the scope of the current document. However, it
>>>> is worth noting that some of these techniques should be approached with
>>>> caution, as ethical concerns may arise. In scientific research, care must
>>>> be taken to avoid enrichment that distorts results or statistical outcomes.
>>>> For data about individuals, privacy issues may arise when combining
>>>> datasets. That is, enriching one dataset with another, when neither
>>>> contains sufficient information about any individual to identify them, may
>>>> yield a combined dataset that compromises privacy. Furthermore, these
>>>> techniqes can be carried out at scale, which in turn highlights the need
>>>> for caution.
>>>>
>>>
>>> Then, in the document introduction, I would suggest adding the following,
>>> after the paragraph that begins “In this context…”.
>>>
>>> Not all data should be shared openly, however. Security, commercial
>>>> sensitivity and, above all, individuals' privacy need to be taken into
>>>> account. It is for data publishers, not a technical standards working
>>>> group, to determine policy on which data should be shared and under what
>>>> circumstances. Data sharing policies are likely to assess the exposure risk
>>>> and determine the appropriate security measures to be taken to protect
>>>> sensitive data, such as secure authentication and authorization.
>>>>
>>>> Depending on circumstances, sensitive information about individuals
>>>> might include full name, home address, email address, national
>>>> identification number, IP address, vehicle registration plate number,
>>>> driver's license number, face, fingerprints, or handwriting, credit card
>>>> numbers, digital identity, date of birth, birthplace, genetic information,
>>>> telephone number, login name, screen name, nickname, health records etc.
>>>> Although it is likely to be safe to share some of that information openly,
>>>> and even more within a controlled environment, publishers should bear in
>>>> mind that combining data from multiple sources may allow inadvertent
>>>> identification of individuals.
>>>>
>>>
>>> (I took out mention of https, as it will soon be everywhere, which would
>>> make our doc out of date.)
>>>
>>> Also, I noticed a grammatical error in the implementation section of BP
>>> 31. (Subject-verb agreement is off.) It should read "Techniques for data
>>> enrichment are complex and go well beyond the scope of this document, which
>>> can only highlight the possibilities."
>>> -Annette
>>>
>>> On May 6, 2016, at 7:50 AM, Phil Archer <phila@w3.org> wrote:
>>>>
>>>> Berna,
>>>>
>>>> As promised, I've copied the text from the sensitive data section and
>>>> merged some of it with the data enrichment intro to end up with this as a
>>>> suggestion.
>>>>
>>>> @Annette - we resolved to do this and move the BP about data
>>>> unavailability to the data access section. Do you agree with this?
>>>>
>>>> ===Begins==
>>>>
>>>> Data enrichment refers to a set of processes that can be used to
>>>> enhance, refine or otherwise improve raw or previously processed data. This
>>>> idea and other similar concepts contribute to making data a valuable asset
>>>> for almost any modern business or enterprise. It is a diverse topic in
>>>> itself, details of which are beyond the scope of the current document.
>>>> However, it is worth noting that techniques exist to carry out such
>>>> enrichment at scale which in turn highlights the need for caution.
>>>>
>>>> Not all data should be shared openly. Security, commercial sensitivity
>>>> and, above all, individuals' privacy need to be taken into account. It is
>>>> for data publishers, not a technical standards working group, to determine
>>>> policy on which data should be shared and under what circumstances. Data
>>>> sharing policies are likely to assess the exposure risk and determine the
>>>> appropriate security measures to be taken to protect sensitive data, such
>>>> as secure authentication and use of HTTPS.
>>>>
>>>> Depending on circumstance, sensitive information about individuals might
>>>> include: full name, home address, email address, national identification
>>>> number, IP address, vehicle registration plate number, driver's license
>>>> number, face, fingerprints, or handwriting, credit card numbers, digital
>>>> identity, date of birth, birthplace, genetic information, telephone number,
>>>> login name, screen name, nickname, health records etc. Although it is
>>>> likely to be safe to share some of that information openly, and even more
>>>> within a controlled environment, publishers should bear in mind that data
>>>> enrichment techniques may allow some elements to be discovered and linked
>>>> from elsewhere.
>>>>
>>>> Notwithstanding that caution, data enrichment offers exciting
>>>> possibilities for both data publishers and consumers.
>>>>
>>>>
>>>>
>>>> == ends==
>>>>
>>>> --
>>>>
>>>>
>>>> Phil Archer
>>>> W3C Data Activity Lead
>>>> http://www.w3.org/2013/data/
>>>>
>>>> http://philarcher.org
>>>> +44 (0)7887 767755
>>>> @philarcher1
>>>>
>>>
>>>
>>>
>> --
>>
>>
>> Phil Archer
>> W3C Data Activity Lead
>> http://www.w3.org/2013/data/
>>
>> http://philarcher.org
>> +44 (0)7887 767755
>> @philarcher1
>>
>

-- 


Phil Archer
W3C Data Activity Lead
http://www.w3.org/2013/data/

http://philarcher.org
+44 (0)7887 767755
@philarcher1
Received on Tuesday, 10 May 2016 15:35:59 UTC