- From: Phil Archer <phila@w3.org>
- Date: Mon, 27 Jun 2016 19:14:09 +0100
- To: Deirdre Lee <deirdre@derilinx.com>, public-dwbp-wg@w3.org, eric.kauz@gs1.org
Yep. Done On 27/06/2016 19:00, Deirdre Lee wrote: > Hi, > > Suggest that the updates below mean we can close > https://www.w3.org/2013/dwbp/track/actions/279? > > Cheers, > > Deirdre > > > On 10/05/2016 19:04, Annette Greiner wrote: >> >> +1 from me 2, BTW. >> >> -Annette >> >> >> On 5/10/16 8:35 AM, Phil Archer wrote: >>> Thanks Eric, I've made those changes. >>> >>> @Editors, these are included in my current pull request. >>> >>> Phil >>> >>> On 09/05/2016 14:22, Eric Stephan wrote: >>>> Phil and Annette, >>>> >>>> I agree with the placement of the sensitive data in the introduction >>>> of the >>>> document. >>>> >>>> ~~~ >>>> >>>> Could you change "Not all data" to "Not all data (and metadata)"? >>>> If this >>>> were changed there would be no need for adding anything to the metadata >>>> section. >>>> >>>> ~~~ >>>> >>>> Could you remove the statement: >>>> >>>> "It is for data publishers, not a technical standards working group, to >>>> determine policy on which data should be shared and under what >>>> circumstances. " >>>> >>>> To: >>>> >>>> "It is for data publishers to determine policy on which data should be >>>> shared and under what circumstances. " >>>> >>>> I understand the sentiment "not a technical standards working group" >>>> in the >>>> context of the DWBP , the tone of the statement just sounds a bit to >>>> universal. Some domain specific (health care) technical standards >>>> for data >>>> sharing are constrained by policy and protocol. >>>> >>>> Thanks and great work, >>>> >>>> Eric S. >>>> >>>> On Mon, May 9, 2016 at 3:38 AM, Phil Archer <phila@w3.org> wrote: >>>> >>>>> Dear all, >>>>> >>>>> I have implemented a couple of minor changes to the BP doc that >>>>> came up as >>>>> a result of our discussions on Friday. >>>>> >>>>> 1. Thanks Eric K for the suggested text that allows us to include >>>>> direct >>>>> refs to the GS1 work which, IMHO, is well worth including. See >>>>> versions of >>>>> your text in >>>>> http://philarcher1.github.io/dwbp/bp.html#identifiersWithinDatasets >>>>> >>>>> (For tracker, action-279) >>>>> >>>>> 2. Thanks Eric S for words on privacy - I think you'll agree that what >>>>> Annette has suggested covers exactly what you were talking about? >>>>> >>>>> (For tracker: action-278) >>>>> >>>>> 3. Annette, thanks for reviewing and improving the suggestions about >>>>> handling sensitive data - I agree entirely with your suggestions >>>>> and, more >>>>> importantly, I am confident that they reflect the wider discussions >>>>> we had >>>>> on Friday's call. Therefore I have put your improved text in my latest >>>>> version of the doc. See >>>>> http://philarcher1.github.io/dwbp/bp.html#intro >>>>> and >>>>> http://philarcher1.github.io/dwbp/bp.html#enrichment >>>>> >>>>> 4. Having done that, I have deleted the Sensitive Data section and >>>>> moved >>>>> the Data Unavailability to within the Access section, just before >>>>> the sub >>>>> section on APIs. I really wasn't sure where to put it but that >>>>> seemed as >>>>> good as anywhere? >>>>> >>>>> 5. I've made that sub group on BPs on APIs into a section so it >>>>> becomes >>>>> 8.10.1, complete with ID and all the rest of it. >>>>> >>>>> 6. I've updated the date of the doc to today's date. >>>>> >>>>> 7. Editors - I have issued a Pull Request for your consideration. >>>>> >>>>> HTH >>>>> >>>>> Phil. >>>>> >>>>> >>>>> >>>>> On 07/05/2016 21:05, Annette Greiner wrote: >>>>> >>>>>> Hi Phil, >>>>>> Thanks for letting me weigh in. I understand the connection you’re >>>>>> making >>>>>> here, and I think it’s a good thing to mention in the enrichment >>>>>> section. >>>>>> What I think is crucial but is not yet reflected in here is the >>>>>> issue of >>>>>> privacy breach arising from putting together disparate data that >>>>>> presents >>>>>> less risk separately. The second paragraph here is a good but more >>>>>> general >>>>>> discussion of security and privacy issues that strikes me as not >>>>>> belonging >>>>>> in this particular section. I would suggest instead addressing the >>>>>> more >>>>>> general issues in the introduction to our document. Most of the third >>>>>> paragraph would also be better in the document introduction, but >>>>>> the last >>>>>> sentence is relevant here. As I see it, the real issue with data >>>>>> enrichment >>>>>> is combining datasets that each hold so little information about any >>>>>> individual that they cannot be identified but that together offer >>>>>> enough >>>>>> information that they can be. I would suggest that here we just say, >>>>>> >>>>>> Data enrichment refers to a set of processes that can be used to >>>>>> enhance, >>>>>>> refine or otherwise improve raw or previously processed data. >>>>>>> This idea and >>>>>>> other similar concepts contribute to making data a valuable asset >>>>>>> for >>>>>>> almost any modern business or enterprise. It is a diverse topic >>>>>>> in itself, >>>>>>> details of which are beyond the scope of the current document. >>>>>>> However, it >>>>>>> is worth noting that some of these techniques should be >>>>>>> approached with >>>>>>> caution, as ethical concerns may arise. In scientific research, >>>>>>> care must >>>>>>> be taken to avoid enrichment that distorts results or statistical >>>>>>> outcomes. >>>>>>> For data about individuals, privacy issues may arise when combining >>>>>>> datasets. That is, enriching one dataset with another, when neither >>>>>>> contains sufficient information about any individual to identify >>>>>>> them, may >>>>>>> yield a combined dataset that compromises privacy. Furthermore, >>>>>>> these >>>>>>> techniqes can be carried out at scale, which in turn highlights >>>>>>> the need >>>>>>> for caution. >>>>>>> >>>>>> >>>>>> Then, in the document introduction, I would suggest adding the >>>>>> following, >>>>>> after the paragraph that begins “In this context…”. >>>>>> >>>>>> Not all data should be shared openly, however. Security, commercial >>>>>>> sensitivity and, above all, individuals' privacy need to be taken >>>>>>> into >>>>>>> account. It is for data publishers, not a technical standards >>>>>>> working >>>>>>> group, to determine policy on which data should be shared and >>>>>>> under what >>>>>>> circumstances. Data sharing policies are likely to assess the >>>>>>> exposure risk >>>>>>> and determine the appropriate security measures to be taken to >>>>>>> protect >>>>>>> sensitive data, such as secure authentication and authorization. >>>>>>> >>>>>>> Depending on circumstances, sensitive information about individuals >>>>>>> might include full name, home address, email address, national >>>>>>> identification number, IP address, vehicle registration plate >>>>>>> number, >>>>>>> driver's license number, face, fingerprints, or handwriting, >>>>>>> credit card >>>>>>> numbers, digital identity, date of birth, birthplace, genetic >>>>>>> information, >>>>>>> telephone number, login name, screen name, nickname, health >>>>>>> records etc. >>>>>>> Although it is likely to be safe to share some of that >>>>>>> information openly, >>>>>>> and even more within a controlled environment, publishers should >>>>>>> bear in >>>>>>> mind that combining data from multiple sources may allow inadvertent >>>>>>> identification of individuals. >>>>>>> >>>>>> >>>>>> (I took out mention of https, as it will soon be everywhere, which >>>>>> would >>>>>> make our doc out of date.) >>>>>> >>>>>> Also, I noticed a grammatical error in the implementation section >>>>>> of BP >>>>>> 31. (Subject-verb agreement is off.) It should read "Techniques >>>>>> for data >>>>>> enrichment are complex and go well beyond the scope of this >>>>>> document, which >>>>>> can only highlight the possibilities." >>>>>> -Annette >>>>>> >>>>>> On May 6, 2016, at 7:50 AM, Phil Archer <phila@w3.org> wrote: >>>>>>> >>>>>>> Berna, >>>>>>> >>>>>>> As promised, I've copied the text from the sensitive data section >>>>>>> and >>>>>>> merged some of it with the data enrichment intro to end up with >>>>>>> this as a >>>>>>> suggestion. >>>>>>> >>>>>>> @Annette - we resolved to do this and move the BP about data >>>>>>> unavailability to the data access section. Do you agree with this? >>>>>>> >>>>>>> ===Begins== >>>>>>> >>>>>>> Data enrichment refers to a set of processes that can be used to >>>>>>> enhance, refine or otherwise improve raw or previously processed >>>>>>> data. This >>>>>>> idea and other similar concepts contribute to making data a >>>>>>> valuable asset >>>>>>> for almost any modern business or enterprise. It is a diverse >>>>>>> topic in >>>>>>> itself, details of which are beyond the scope of the current >>>>>>> document. >>>>>>> However, it is worth noting that techniques exist to carry out such >>>>>>> enrichment at scale which in turn highlights the need for caution. >>>>>>> >>>>>>> Not all data should be shared openly. Security, commercial >>>>>>> sensitivity >>>>>>> and, above all, individuals' privacy need to be taken into >>>>>>> account. It is >>>>>>> for data publishers, not a technical standards working group, to >>>>>>> determine >>>>>>> policy on which data should be shared and under what >>>>>>> circumstances. Data >>>>>>> sharing policies are likely to assess the exposure risk and >>>>>>> determine the >>>>>>> appropriate security measures to be taken to protect sensitive >>>>>>> data, such >>>>>>> as secure authentication and use of HTTPS. >>>>>>> >>>>>>> Depending on circumstance, sensitive information about >>>>>>> individuals might >>>>>>> include: full name, home address, email address, national >>>>>>> identification >>>>>>> number, IP address, vehicle registration plate number, driver's >>>>>>> license >>>>>>> number, face, fingerprints, or handwriting, credit card numbers, >>>>>>> digital >>>>>>> identity, date of birth, birthplace, genetic information, >>>>>>> telephone number, >>>>>>> login name, screen name, nickname, health records etc. Although >>>>>>> it is >>>>>>> likely to be safe to share some of that information openly, and >>>>>>> even more >>>>>>> within a controlled environment, publishers should bear in mind >>>>>>> that data >>>>>>> enrichment techniques may allow some elements to be discovered >>>>>>> and linked >>>>>>> from elsewhere. >>>>>>> >>>>>>> Notwithstanding that caution, data enrichment offers exciting >>>>>>> possibilities for both data publishers and consumers. >>>>>>> >>>>>>> >>>>>>> >>>>>>> == ends== >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> >>>>>>> Phil Archer >>>>>>> W3C Data Activity Lead >>>>>>> http://www.w3.org/2013/data/ >>>>>>> >>>>>>> http://philarcher.org >>>>>>> +44 (0)7887 767755 >>>>>>> @philarcher1 >>>>>>> >>>>>> >>>>>> >>>>>> >>>>> -- >>>>> >>>>> >>>>> Phil Archer >>>>> W3C Data Activity Lead >>>>> http://www.w3.org/2013/data/ >>>>> >>>>> http://philarcher.org >>>>> +44 (0)7887 767755 >>>>> @philarcher1 >>>>> >>>> >>> >> >> -- >> Annette Greiner >> NERSC Data and Analytics Services >> Lawrence Berkeley National Laboratory >> > -- Phil Archer W3C Data Activity Lead http://www.w3.org/2013/data/ http://philarcher.org +44 (0)7887 767755 @philarcher1
Received on Monday, 27 June 2016 18:13:30 UTC