- From: Deirdre Lee <deirdre@derilinx.com>
- Date: Mon, 27 Jun 2016 19:00:42 +0100
- To: public-dwbp-wg@w3.org, Phil Archer <phila@w3.org>, eric.kauz@gs1.org
- Message-ID: <f7002efe-9eee-b2dc-ff1a-57b0ab36f0d3@derilinx.com>
Hi, Suggest that the updates below mean we can close https://www.w3.org/2013/dwbp/track/actions/279? Cheers, Deirdre On 10/05/2016 19:04, Annette Greiner wrote: > > +1 from me 2, BTW. > > -Annette > > > On 5/10/16 8:35 AM, Phil Archer wrote: >> Thanks Eric, I've made those changes. >> >> @Editors, these are included in my current pull request. >> >> Phil >> >> On 09/05/2016 14:22, Eric Stephan wrote: >>> Phil and Annette, >>> >>> I agree with the placement of the sensitive data in the introduction >>> of the >>> document. >>> >>> ~~~ >>> >>> Could you change "Not all data" to "Not all data (and metadata)"? >>> If this >>> were changed there would be no need for adding anything to the metadata >>> section. >>> >>> ~~~ >>> >>> Could you remove the statement: >>> >>> "It is for data publishers, not a technical standards working group, to >>> determine policy on which data should be shared and under what >>> circumstances. " >>> >>> To: >>> >>> "It is for data publishers to determine policy on which data should be >>> shared and under what circumstances. " >>> >>> I understand the sentiment "not a technical standards working group" >>> in the >>> context of the DWBP , the tone of the statement just sounds a bit to >>> universal. Some domain specific (health care) technical standards >>> for data >>> sharing are constrained by policy and protocol. >>> >>> Thanks and great work, >>> >>> Eric S. >>> >>> On Mon, May 9, 2016 at 3:38 AM, Phil Archer <phila@w3.org> wrote: >>> >>>> Dear all, >>>> >>>> I have implemented a couple of minor changes to the BP doc that >>>> came up as >>>> a result of our discussions on Friday. >>>> >>>> 1. Thanks Eric K for the suggested text that allows us to include >>>> direct >>>> refs to the GS1 work which, IMHO, is well worth including. See >>>> versions of >>>> your text in >>>> http://philarcher1.github.io/dwbp/bp.html#identifiersWithinDatasets >>>> >>>> (For tracker, action-279) >>>> >>>> 2. Thanks Eric S for words on privacy - I think you'll agree that what >>>> Annette has suggested covers exactly what you were talking about? >>>> >>>> (For tracker: action-278) >>>> >>>> 3. Annette, thanks for reviewing and improving the suggestions about >>>> handling sensitive data - I agree entirely with your suggestions >>>> and, more >>>> importantly, I am confident that they reflect the wider discussions >>>> we had >>>> on Friday's call. Therefore I have put your improved text in my latest >>>> version of the doc. See >>>> http://philarcher1.github.io/dwbp/bp.html#intro >>>> and >>>> http://philarcher1.github.io/dwbp/bp.html#enrichment >>>> >>>> 4. Having done that, I have deleted the Sensitive Data section and >>>> moved >>>> the Data Unavailability to within the Access section, just before >>>> the sub >>>> section on APIs. I really wasn't sure where to put it but that >>>> seemed as >>>> good as anywhere? >>>> >>>> 5. I've made that sub group on BPs on APIs into a section so it >>>> becomes >>>> 8.10.1, complete with ID and all the rest of it. >>>> >>>> 6. I've updated the date of the doc to today's date. >>>> >>>> 7. Editors - I have issued a Pull Request for your consideration. >>>> >>>> HTH >>>> >>>> Phil. >>>> >>>> >>>> >>>> On 07/05/2016 21:05, Annette Greiner wrote: >>>> >>>>> Hi Phil, >>>>> Thanks for letting me weigh in. I understand the connection you’re >>>>> making >>>>> here, and I think it’s a good thing to mention in the enrichment >>>>> section. >>>>> What I think is crucial but is not yet reflected in here is the >>>>> issue of >>>>> privacy breach arising from putting together disparate data that >>>>> presents >>>>> less risk separately. The second paragraph here is a good but more >>>>> general >>>>> discussion of security and privacy issues that strikes me as not >>>>> belonging >>>>> in this particular section. I would suggest instead addressing the >>>>> more >>>>> general issues in the introduction to our document. Most of the third >>>>> paragraph would also be better in the document introduction, but >>>>> the last >>>>> sentence is relevant here. As I see it, the real issue with data >>>>> enrichment >>>>> is combining datasets that each hold so little information about any >>>>> individual that they cannot be identified but that together offer >>>>> enough >>>>> information that they can be. I would suggest that here we just say, >>>>> >>>>> Data enrichment refers to a set of processes that can be used to >>>>> enhance, >>>>>> refine or otherwise improve raw or previously processed data. >>>>>> This idea and >>>>>> other similar concepts contribute to making data a valuable asset >>>>>> for >>>>>> almost any modern business or enterprise. It is a diverse topic >>>>>> in itself, >>>>>> details of which are beyond the scope of the current document. >>>>>> However, it >>>>>> is worth noting that some of these techniques should be >>>>>> approached with >>>>>> caution, as ethical concerns may arise. In scientific research, >>>>>> care must >>>>>> be taken to avoid enrichment that distorts results or statistical >>>>>> outcomes. >>>>>> For data about individuals, privacy issues may arise when combining >>>>>> datasets. That is, enriching one dataset with another, when neither >>>>>> contains sufficient information about any individual to identify >>>>>> them, may >>>>>> yield a combined dataset that compromises privacy. Furthermore, >>>>>> these >>>>>> techniqes can be carried out at scale, which in turn highlights >>>>>> the need >>>>>> for caution. >>>>>> >>>>> >>>>> Then, in the document introduction, I would suggest adding the >>>>> following, >>>>> after the paragraph that begins “In this context…”. >>>>> >>>>> Not all data should be shared openly, however. Security, commercial >>>>>> sensitivity and, above all, individuals' privacy need to be taken >>>>>> into >>>>>> account. It is for data publishers, not a technical standards >>>>>> working >>>>>> group, to determine policy on which data should be shared and >>>>>> under what >>>>>> circumstances. Data sharing policies are likely to assess the >>>>>> exposure risk >>>>>> and determine the appropriate security measures to be taken to >>>>>> protect >>>>>> sensitive data, such as secure authentication and authorization. >>>>>> >>>>>> Depending on circumstances, sensitive information about individuals >>>>>> might include full name, home address, email address, national >>>>>> identification number, IP address, vehicle registration plate >>>>>> number, >>>>>> driver's license number, face, fingerprints, or handwriting, >>>>>> credit card >>>>>> numbers, digital identity, date of birth, birthplace, genetic >>>>>> information, >>>>>> telephone number, login name, screen name, nickname, health >>>>>> records etc. >>>>>> Although it is likely to be safe to share some of that >>>>>> information openly, >>>>>> and even more within a controlled environment, publishers should >>>>>> bear in >>>>>> mind that combining data from multiple sources may allow inadvertent >>>>>> identification of individuals. >>>>>> >>>>> >>>>> (I took out mention of https, as it will soon be everywhere, which >>>>> would >>>>> make our doc out of date.) >>>>> >>>>> Also, I noticed a grammatical error in the implementation section >>>>> of BP >>>>> 31. (Subject-verb agreement is off.) It should read "Techniques >>>>> for data >>>>> enrichment are complex and go well beyond the scope of this >>>>> document, which >>>>> can only highlight the possibilities." >>>>> -Annette >>>>> >>>>> On May 6, 2016, at 7:50 AM, Phil Archer <phila@w3.org> wrote: >>>>>> >>>>>> Berna, >>>>>> >>>>>> As promised, I've copied the text from the sensitive data section >>>>>> and >>>>>> merged some of it with the data enrichment intro to end up with >>>>>> this as a >>>>>> suggestion. >>>>>> >>>>>> @Annette - we resolved to do this and move the BP about data >>>>>> unavailability to the data access section. Do you agree with this? >>>>>> >>>>>> ===Begins== >>>>>> >>>>>> Data enrichment refers to a set of processes that can be used to >>>>>> enhance, refine or otherwise improve raw or previously processed >>>>>> data. This >>>>>> idea and other similar concepts contribute to making data a >>>>>> valuable asset >>>>>> for almost any modern business or enterprise. It is a diverse >>>>>> topic in >>>>>> itself, details of which are beyond the scope of the current >>>>>> document. >>>>>> However, it is worth noting that techniques exist to carry out such >>>>>> enrichment at scale which in turn highlights the need for caution. >>>>>> >>>>>> Not all data should be shared openly. Security, commercial >>>>>> sensitivity >>>>>> and, above all, individuals' privacy need to be taken into >>>>>> account. It is >>>>>> for data publishers, not a technical standards working group, to >>>>>> determine >>>>>> policy on which data should be shared and under what >>>>>> circumstances. Data >>>>>> sharing policies are likely to assess the exposure risk and >>>>>> determine the >>>>>> appropriate security measures to be taken to protect sensitive >>>>>> data, such >>>>>> as secure authentication and use of HTTPS. >>>>>> >>>>>> Depending on circumstance, sensitive information about >>>>>> individuals might >>>>>> include: full name, home address, email address, national >>>>>> identification >>>>>> number, IP address, vehicle registration plate number, driver's >>>>>> license >>>>>> number, face, fingerprints, or handwriting, credit card numbers, >>>>>> digital >>>>>> identity, date of birth, birthplace, genetic information, >>>>>> telephone number, >>>>>> login name, screen name, nickname, health records etc. Although >>>>>> it is >>>>>> likely to be safe to share some of that information openly, and >>>>>> even more >>>>>> within a controlled environment, publishers should bear in mind >>>>>> that data >>>>>> enrichment techniques may allow some elements to be discovered >>>>>> and linked >>>>>> from elsewhere. >>>>>> >>>>>> Notwithstanding that caution, data enrichment offers exciting >>>>>> possibilities for both data publishers and consumers. >>>>>> >>>>>> >>>>>> >>>>>> == ends== >>>>>> >>>>>> -- >>>>>> >>>>>> >>>>>> Phil Archer >>>>>> W3C Data Activity Lead >>>>>> http://www.w3.org/2013/data/ >>>>>> >>>>>> http://philarcher.org >>>>>> +44 (0)7887 767755 >>>>>> @philarcher1 >>>>>> >>>>> >>>>> >>>>> >>>> -- >>>> >>>> >>>> Phil Archer >>>> W3C Data Activity Lead >>>> http://www.w3.org/2013/data/ >>>> >>>> http://philarcher.org >>>> +44 (0)7887 767755 >>>> @philarcher1 >>>> >>> >> > > -- > Annette Greiner > NERSC Data and Analytics Services > Lawrence Berkeley National Laboratory > -- ------------------------------------ Deirdre Lee, CEO & Founder Derilinx - Linked & Open Data Solutions Web: www.derilinx.com Email: deirdre@derilinx.com Address: 11/12 Baggot Court, Dublin 2, D02 F891 Tel: +353 (0)1 254 4316 Mob: +353 (0)87 417 2318 Linkedin: ie.linkedin.com/in/leedeirdre/ Twitter: @deirdrelee
Received on Monday, 27 June 2016 18:01:18 UTC