- From: Annette Greiner <amgreiner@lbl.gov>
- Date: Tue, 10 May 2016 11:04:52 -0700
- To: Phil Archer <phila@w3.org>, Eric Stephan <ericphb@gmail.com>
- Cc: Public DWBP WG <public-dwbp-wg@w3.org>, Bernadette Farias Lóscio <bfl@cin.ufpe.br>, Eric Kauz <eric.kauz@gs1.org>, Caroline Burle <cburle@nic.br>, Newton Calegari <newton@nic.br>
- Message-ID: <5763071e-21f7-43c0-d4ef-d1513aa09844@lbl.gov>
+1 from me 2, BTW. -Annette On 5/10/16 8:35 AM, Phil Archer wrote: > Thanks Eric, I've made those changes. > > @Editors, these are included in my current pull request. > > Phil > > On 09/05/2016 14:22, Eric Stephan wrote: >> Phil and Annette, >> >> I agree with the placement of the sensitive data in the introduction >> of the >> document. >> >> ~~~ >> >> Could you change "Not all data" to "Not all data (and metadata)"? If >> this >> were changed there would be no need for adding anything to the metadata >> section. >> >> ~~~ >> >> Could you remove the statement: >> >> "It is for data publishers, not a technical standards working group, to >> determine policy on which data should be shared and under what >> circumstances. " >> >> To: >> >> "It is for data publishers to determine policy on which data should be >> shared and under what circumstances. " >> >> I understand the sentiment "not a technical standards working group" >> in the >> context of the DWBP , the tone of the statement just sounds a bit to >> universal. Some domain specific (health care) technical standards >> for data >> sharing are constrained by policy and protocol. >> >> Thanks and great work, >> >> Eric S. >> >> On Mon, May 9, 2016 at 3:38 AM, Phil Archer <phila@w3.org> wrote: >> >>> Dear all, >>> >>> I have implemented a couple of minor changes to the BP doc that came >>> up as >>> a result of our discussions on Friday. >>> >>> 1. Thanks Eric K for the suggested text that allows us to include >>> direct >>> refs to the GS1 work which, IMHO, is well worth including. See >>> versions of >>> your text in >>> http://philarcher1.github.io/dwbp/bp.html#identifiersWithinDatasets >>> >>> (For tracker, action-279) >>> >>> 2. Thanks Eric S for words on privacy - I think you'll agree that what >>> Annette has suggested covers exactly what you were talking about? >>> >>> (For tracker: action-278) >>> >>> 3. Annette, thanks for reviewing and improving the suggestions about >>> handling sensitive data - I agree entirely with your suggestions >>> and, more >>> importantly, I am confident that they reflect the wider discussions >>> we had >>> on Friday's call. Therefore I have put your improved text in my latest >>> version of the doc. See >>> http://philarcher1.github.io/dwbp/bp.html#intro >>> and >>> http://philarcher1.github.io/dwbp/bp.html#enrichment >>> >>> 4. Having done that, I have deleted the Sensitive Data section and >>> moved >>> the Data Unavailability to within the Access section, just before >>> the sub >>> section on APIs. I really wasn't sure where to put it but that >>> seemed as >>> good as anywhere? >>> >>> 5. I've made that sub group on BPs on APIs into a section so it becomes >>> 8.10.1, complete with ID and all the rest of it. >>> >>> 6. I've updated the date of the doc to today's date. >>> >>> 7. Editors - I have issued a Pull Request for your consideration. >>> >>> HTH >>> >>> Phil. >>> >>> >>> >>> On 07/05/2016 21:05, Annette Greiner wrote: >>> >>>> Hi Phil, >>>> Thanks for letting me weigh in. I understand the connection you’re >>>> making >>>> here, and I think it’s a good thing to mention in the enrichment >>>> section. >>>> What I think is crucial but is not yet reflected in here is the >>>> issue of >>>> privacy breach arising from putting together disparate data that >>>> presents >>>> less risk separately. The second paragraph here is a good but more >>>> general >>>> discussion of security and privacy issues that strikes me as not >>>> belonging >>>> in this particular section. I would suggest instead addressing the >>>> more >>>> general issues in the introduction to our document. Most of the third >>>> paragraph would also be better in the document introduction, but >>>> the last >>>> sentence is relevant here. As I see it, the real issue with data >>>> enrichment >>>> is combining datasets that each hold so little information about any >>>> individual that they cannot be identified but that together offer >>>> enough >>>> information that they can be. I would suggest that here we just say, >>>> >>>> Data enrichment refers to a set of processes that can be used to >>>> enhance, >>>>> refine or otherwise improve raw or previously processed data. This >>>>> idea and >>>>> other similar concepts contribute to making data a valuable asset for >>>>> almost any modern business or enterprise. It is a diverse topic in >>>>> itself, >>>>> details of which are beyond the scope of the current document. >>>>> However, it >>>>> is worth noting that some of these techniques should be approached >>>>> with >>>>> caution, as ethical concerns may arise. In scientific research, >>>>> care must >>>>> be taken to avoid enrichment that distorts results or statistical >>>>> outcomes. >>>>> For data about individuals, privacy issues may arise when combining >>>>> datasets. That is, enriching one dataset with another, when neither >>>>> contains sufficient information about any individual to identify >>>>> them, may >>>>> yield a combined dataset that compromises privacy. Furthermore, these >>>>> techniqes can be carried out at scale, which in turn highlights >>>>> the need >>>>> for caution. >>>>> >>>> >>>> Then, in the document introduction, I would suggest adding the >>>> following, >>>> after the paragraph that begins “In this context…”. >>>> >>>> Not all data should be shared openly, however. Security, commercial >>>>> sensitivity and, above all, individuals' privacy need to be taken >>>>> into >>>>> account. It is for data publishers, not a technical standards working >>>>> group, to determine policy on which data should be shared and >>>>> under what >>>>> circumstances. Data sharing policies are likely to assess the >>>>> exposure risk >>>>> and determine the appropriate security measures to be taken to >>>>> protect >>>>> sensitive data, such as secure authentication and authorization. >>>>> >>>>> Depending on circumstances, sensitive information about individuals >>>>> might include full name, home address, email address, national >>>>> identification number, IP address, vehicle registration plate number, >>>>> driver's license number, face, fingerprints, or handwriting, >>>>> credit card >>>>> numbers, digital identity, date of birth, birthplace, genetic >>>>> information, >>>>> telephone number, login name, screen name, nickname, health >>>>> records etc. >>>>> Although it is likely to be safe to share some of that information >>>>> openly, >>>>> and even more within a controlled environment, publishers should >>>>> bear in >>>>> mind that combining data from multiple sources may allow inadvertent >>>>> identification of individuals. >>>>> >>>> >>>> (I took out mention of https, as it will soon be everywhere, which >>>> would >>>> make our doc out of date.) >>>> >>>> Also, I noticed a grammatical error in the implementation section >>>> of BP >>>> 31. (Subject-verb agreement is off.) It should read "Techniques for >>>> data >>>> enrichment are complex and go well beyond the scope of this >>>> document, which >>>> can only highlight the possibilities." >>>> -Annette >>>> >>>> On May 6, 2016, at 7:50 AM, Phil Archer <phila@w3.org> wrote: >>>>> >>>>> Berna, >>>>> >>>>> As promised, I've copied the text from the sensitive data section and >>>>> merged some of it with the data enrichment intro to end up with >>>>> this as a >>>>> suggestion. >>>>> >>>>> @Annette - we resolved to do this and move the BP about data >>>>> unavailability to the data access section. Do you agree with this? >>>>> >>>>> ===Begins== >>>>> >>>>> Data enrichment refers to a set of processes that can be used to >>>>> enhance, refine or otherwise improve raw or previously processed >>>>> data. This >>>>> idea and other similar concepts contribute to making data a >>>>> valuable asset >>>>> for almost any modern business or enterprise. It is a diverse >>>>> topic in >>>>> itself, details of which are beyond the scope of the current >>>>> document. >>>>> However, it is worth noting that techniques exist to carry out such >>>>> enrichment at scale which in turn highlights the need for caution. >>>>> >>>>> Not all data should be shared openly. Security, commercial >>>>> sensitivity >>>>> and, above all, individuals' privacy need to be taken into >>>>> account. It is >>>>> for data publishers, not a technical standards working group, to >>>>> determine >>>>> policy on which data should be shared and under what >>>>> circumstances. Data >>>>> sharing policies are likely to assess the exposure risk and >>>>> determine the >>>>> appropriate security measures to be taken to protect sensitive >>>>> data, such >>>>> as secure authentication and use of HTTPS. >>>>> >>>>> Depending on circumstance, sensitive information about individuals >>>>> might >>>>> include: full name, home address, email address, national >>>>> identification >>>>> number, IP address, vehicle registration plate number, driver's >>>>> license >>>>> number, face, fingerprints, or handwriting, credit card numbers, >>>>> digital >>>>> identity, date of birth, birthplace, genetic information, >>>>> telephone number, >>>>> login name, screen name, nickname, health records etc. Although it is >>>>> likely to be safe to share some of that information openly, and >>>>> even more >>>>> within a controlled environment, publishers should bear in mind >>>>> that data >>>>> enrichment techniques may allow some elements to be discovered and >>>>> linked >>>>> from elsewhere. >>>>> >>>>> Notwithstanding that caution, data enrichment offers exciting >>>>> possibilities for both data publishers and consumers. >>>>> >>>>> >>>>> >>>>> == ends== >>>>> >>>>> -- >>>>> >>>>> >>>>> Phil Archer >>>>> W3C Data Activity Lead >>>>> http://www.w3.org/2013/data/ >>>>> >>>>> http://philarcher.org >>>>> +44 (0)7887 767755 >>>>> @philarcher1 >>>>> >>>> >>>> >>>> >>> -- >>> >>> >>> Phil Archer >>> W3C Data Activity Lead >>> http://www.w3.org/2013/data/ >>> >>> http://philarcher.org >>> +44 (0)7887 767755 >>> @philarcher1 >>> >> > -- Annette Greiner NERSC Data and Analytics Services Lawrence Berkeley National Laboratory
Received on Tuesday, 10 May 2016 19:56:20 UTC