My additions today (was Re: Sensitive data text for enrichment section)

Dear all,

I have implemented a couple of minor changes to the BP doc that came up 
as a result of our discussions on Friday.

1. Thanks Eric K for the suggested text that allows us to include direct 
refs to the GS1 work which, IMHO, is well worth including. See versions 
of your text in
http://philarcher1.github.io/dwbp/bp.html#identifiersWithinDatasets

(For tracker, action-279)

2. Thanks Eric S for words on privacy - I think you'll agree that what 
Annette has suggested covers exactly what you were talking about?

(For tracker: action-278)

3. Annette, thanks for reviewing and improving the suggestions about 
handling sensitive data - I agree entirely with your suggestions and, 
more importantly, I am confident that they reflect the wider discussions 
we had on Friday's call. Therefore I have put your improved text in my 
latest version of the doc. See
http://philarcher1.github.io/dwbp/bp.html#intro
and
http://philarcher1.github.io/dwbp/bp.html#enrichment

4. Having done that, I have deleted the Sensitive Data section and moved 
the Data Unavailability to within the Access section, just before the 
sub section on APIs. I really wasn't sure where to put it but that 
seemed as  good as anywhere?

5. I've made that sub group on BPs on APIs into a section so it becomes 
8.10.1, complete with ID and all the rest of it.

6. I've updated the date of the doc to today's date.

7. Editors - I have issued a Pull Request for your consideration.

HTH

Phil.



On 07/05/2016 21:05, Annette Greiner wrote:
> Hi Phil,
> Thanks for letting me weigh in. I understand the connection you’re making here, and I think it’s a good thing to mention in the enrichment section. What I think is crucial but is not yet reflected in here is the issue of privacy breach arising from putting together disparate data that presents less risk separately. The second paragraph here is a good but more general discussion of security and privacy issues that strikes me as not belonging in this particular section. I would suggest instead addressing the more general issues in the introduction to our document. Most of the third paragraph would also be better in the document introduction, but the last sentence is relevant here. As I see it, the real issue with data enrichment is combining datasets that each hold so little information about any individual that they cannot be identified but that together offer enough information that they can be. I would suggest that here we just say,
>
>> Data enrichment refers to a set of processes that can be used to enhance, refine or otherwise improve raw or previously processed data. This idea and other similar concepts contribute to making data a valuable asset for almost any modern business or enterprise. It is a diverse topic in itself, details of which are beyond the scope of the current document. However, it is worth noting that some of these techniques should be approached with caution, as ethical concerns may arise. In scientific research, care must be taken to avoid enrichment that distorts results or statistical outcomes. For data about individuals, privacy issues may arise when combining datasets. That is, enriching one dataset with another, when neither contains sufficient information about any individual to identify them, may yield a combined dataset that compromises privacy. Furthermore, these techniqes can be carried out at scale, which in turn highlights the need for caution.
>
> Then, in the document introduction, I would suggest adding the following, after the paragraph that begins “In this context…”.
>
>> Not all data should be shared openly, however. Security, commercial sensitivity and, above all, individuals' privacy need to be taken into account. It is for data publishers, not a technical standards working group, to determine policy on which data should be shared and under what circumstances. Data sharing policies are likely to assess the exposure risk and determine the appropriate security measures to be taken to protect sensitive data, such as secure authentication and authorization.
>>
>> Depending on circumstances, sensitive information about individuals might include full name, home address, email address, national identification number, IP address, vehicle registration plate number, driver's license number, face, fingerprints, or handwriting, credit card numbers, digital identity, date of birth, birthplace, genetic information, telephone number, login name, screen name, nickname, health records etc. Although it is likely to be safe to share some of that information openly, and even more within a controlled environment, publishers should bear in mind that combining data from multiple sources may allow inadvertent identification of individuals.
>
> (I took out mention of https, as it will soon be everywhere, which would make our doc out of date.)
>
> Also, I noticed a grammatical error in the implementation section of BP 31. (Subject-verb agreement is off.) It should read "Techniques for data enrichment are complex and go well beyond the scope of this document, which can only highlight the possibilities."
> -Annette
>
>> On May 6, 2016, at 7:50 AM, Phil Archer <phila@w3.org> wrote:
>>
>> Berna,
>>
>> As promised, I've copied the text from the sensitive data section and merged some of it with the data enrichment intro to end up with this as a suggestion.
>>
>> @Annette - we resolved to do this and move the BP about data unavailability to the data access section. Do you agree with this?
>>
>> ===Begins==
>>
>> Data enrichment refers to a set of processes that can be used to enhance, refine or otherwise improve raw or previously processed data. This idea and other similar concepts contribute to making data a valuable asset for almost any modern business or enterprise. It is a diverse topic in itself, details of which are beyond the scope of the current document. However, it is worth noting that techniques exist to carry out such enrichment at scale which in turn highlights the need for caution.
>>
>> Not all data should be shared openly. Security, commercial sensitivity and, above all, individuals' privacy need to be taken into account. It is for data publishers, not a technical standards working group, to determine policy on which data should be shared and under what circumstances. Data sharing policies are likely to assess the exposure risk and determine the appropriate security measures to be taken to protect sensitive data, such as secure authentication and use of HTTPS.
>>
>> Depending on circumstance, sensitive information about individuals might include: full name, home address, email address, national identification number, IP address, vehicle registration plate number, driver's license number, face, fingerprints, or handwriting, credit card numbers, digital identity, date of birth, birthplace, genetic information, telephone number, login name, screen name, nickname, health records etc. Although it is likely to be safe to share some of that information openly, and even more within a controlled environment, publishers should bear in mind that data enrichment techniques may allow some elements to be discovered and linked from elsewhere.
>>
>> Notwithstanding that caution, data enrichment offers exciting possibilities for both data publishers and consumers.
>>
>>
>>
>> == ends==
>>
>> --
>>
>>
>> Phil Archer
>> W3C Data Activity Lead
>> http://www.w3.org/2013/data/
>>
>> http://philarcher.org
>> +44 (0)7887 767755
>> @philarcher1
>
>

-- 


Phil Archer
W3C Data Activity Lead
http://www.w3.org/2013/data/

http://philarcher.org
+44 (0)7887 767755
@philarcher1

Received on Monday, 9 May 2016 10:38:56 UTC