- From: Phil Archer <phila@w3.org>
- Date: Tue, 10 May 2016 16:35:46 +0100
- To: Eric Stephan <ericphb@gmail.com>
- Cc: Annette Greiner <amgreiner@lbl.gov>, Public DWBP WG <public-dwbp-wg@w3.org>, Bernadette Farias Lóscio <bfl@cin.ufpe.br>, Eric Kauz <eric.kauz@gs1.org>, Caroline Burle <cburle@nic.br>, Newton Calegari <newton@nic.br>
Thanks Eric, I've made those changes. @Editors, these are included in my current pull request. Phil On 09/05/2016 14:22, Eric Stephan wrote: > Phil and Annette, > > I agree with the placement of the sensitive data in the introduction of the > document. > > ~~~ > > Could you change "Not all data" to "Not all data (and metadata)"? If this > were changed there would be no need for adding anything to the metadata > section. > > ~~~ > > Could you remove the statement: > > "It is for data publishers, not a technical standards working group, to > determine policy on which data should be shared and under what > circumstances. " > > To: > > "It is for data publishers to determine policy on which data should be > shared and under what circumstances. " > > I understand the sentiment "not a technical standards working group" in the > context of the DWBP , the tone of the statement just sounds a bit to > universal. Some domain specific (health care) technical standards for data > sharing are constrained by policy and protocol. > > Thanks and great work, > > Eric S. > > On Mon, May 9, 2016 at 3:38 AM, Phil Archer <phila@w3.org> wrote: > >> Dear all, >> >> I have implemented a couple of minor changes to the BP doc that came up as >> a result of our discussions on Friday. >> >> 1. Thanks Eric K for the suggested text that allows us to include direct >> refs to the GS1 work which, IMHO, is well worth including. See versions of >> your text in >> http://philarcher1.github.io/dwbp/bp.html#identifiersWithinDatasets >> >> (For tracker, action-279) >> >> 2. Thanks Eric S for words on privacy - I think you'll agree that what >> Annette has suggested covers exactly what you were talking about? >> >> (For tracker: action-278) >> >> 3. Annette, thanks for reviewing and improving the suggestions about >> handling sensitive data - I agree entirely with your suggestions and, more >> importantly, I am confident that they reflect the wider discussions we had >> on Friday's call. Therefore I have put your improved text in my latest >> version of the doc. See >> http://philarcher1.github.io/dwbp/bp.html#intro >> and >> http://philarcher1.github.io/dwbp/bp.html#enrichment >> >> 4. Having done that, I have deleted the Sensitive Data section and moved >> the Data Unavailability to within the Access section, just before the sub >> section on APIs. I really wasn't sure where to put it but that seemed as >> good as anywhere? >> >> 5. I've made that sub group on BPs on APIs into a section so it becomes >> 8.10.1, complete with ID and all the rest of it. >> >> 6. I've updated the date of the doc to today's date. >> >> 7. Editors - I have issued a Pull Request for your consideration. >> >> HTH >> >> Phil. >> >> >> >> On 07/05/2016 21:05, Annette Greiner wrote: >> >>> Hi Phil, >>> Thanks for letting me weigh in. I understand the connection you’re making >>> here, and I think it’s a good thing to mention in the enrichment section. >>> What I think is crucial but is not yet reflected in here is the issue of >>> privacy breach arising from putting together disparate data that presents >>> less risk separately. The second paragraph here is a good but more general >>> discussion of security and privacy issues that strikes me as not belonging >>> in this particular section. I would suggest instead addressing the more >>> general issues in the introduction to our document. Most of the third >>> paragraph would also be better in the document introduction, but the last >>> sentence is relevant here. As I see it, the real issue with data enrichment >>> is combining datasets that each hold so little information about any >>> individual that they cannot be identified but that together offer enough >>> information that they can be. I would suggest that here we just say, >>> >>> Data enrichment refers to a set of processes that can be used to enhance, >>>> refine or otherwise improve raw or previously processed data. This idea and >>>> other similar concepts contribute to making data a valuable asset for >>>> almost any modern business or enterprise. It is a diverse topic in itself, >>>> details of which are beyond the scope of the current document. However, it >>>> is worth noting that some of these techniques should be approached with >>>> caution, as ethical concerns may arise. In scientific research, care must >>>> be taken to avoid enrichment that distorts results or statistical outcomes. >>>> For data about individuals, privacy issues may arise when combining >>>> datasets. That is, enriching one dataset with another, when neither >>>> contains sufficient information about any individual to identify them, may >>>> yield a combined dataset that compromises privacy. Furthermore, these >>>> techniqes can be carried out at scale, which in turn highlights the need >>>> for caution. >>>> >>> >>> Then, in the document introduction, I would suggest adding the following, >>> after the paragraph that begins “In this context…”. >>> >>> Not all data should be shared openly, however. Security, commercial >>>> sensitivity and, above all, individuals' privacy need to be taken into >>>> account. It is for data publishers, not a technical standards working >>>> group, to determine policy on which data should be shared and under what >>>> circumstances. Data sharing policies are likely to assess the exposure risk >>>> and determine the appropriate security measures to be taken to protect >>>> sensitive data, such as secure authentication and authorization. >>>> >>>> Depending on circumstances, sensitive information about individuals >>>> might include full name, home address, email address, national >>>> identification number, IP address, vehicle registration plate number, >>>> driver's license number, face, fingerprints, or handwriting, credit card >>>> numbers, digital identity, date of birth, birthplace, genetic information, >>>> telephone number, login name, screen name, nickname, health records etc. >>>> Although it is likely to be safe to share some of that information openly, >>>> and even more within a controlled environment, publishers should bear in >>>> mind that combining data from multiple sources may allow inadvertent >>>> identification of individuals. >>>> >>> >>> (I took out mention of https, as it will soon be everywhere, which would >>> make our doc out of date.) >>> >>> Also, I noticed a grammatical error in the implementation section of BP >>> 31. (Subject-verb agreement is off.) It should read "Techniques for data >>> enrichment are complex and go well beyond the scope of this document, which >>> can only highlight the possibilities." >>> -Annette >>> >>> On May 6, 2016, at 7:50 AM, Phil Archer <phila@w3.org> wrote: >>>> >>>> Berna, >>>> >>>> As promised, I've copied the text from the sensitive data section and >>>> merged some of it with the data enrichment intro to end up with this as a >>>> suggestion. >>>> >>>> @Annette - we resolved to do this and move the BP about data >>>> unavailability to the data access section. Do you agree with this? >>>> >>>> ===Begins== >>>> >>>> Data enrichment refers to a set of processes that can be used to >>>> enhance, refine or otherwise improve raw or previously processed data. This >>>> idea and other similar concepts contribute to making data a valuable asset >>>> for almost any modern business or enterprise. It is a diverse topic in >>>> itself, details of which are beyond the scope of the current document. >>>> However, it is worth noting that techniques exist to carry out such >>>> enrichment at scale which in turn highlights the need for caution. >>>> >>>> Not all data should be shared openly. Security, commercial sensitivity >>>> and, above all, individuals' privacy need to be taken into account. It is >>>> for data publishers, not a technical standards working group, to determine >>>> policy on which data should be shared and under what circumstances. Data >>>> sharing policies are likely to assess the exposure risk and determine the >>>> appropriate security measures to be taken to protect sensitive data, such >>>> as secure authentication and use of HTTPS. >>>> >>>> Depending on circumstance, sensitive information about individuals might >>>> include: full name, home address, email address, national identification >>>> number, IP address, vehicle registration plate number, driver's license >>>> number, face, fingerprints, or handwriting, credit card numbers, digital >>>> identity, date of birth, birthplace, genetic information, telephone number, >>>> login name, screen name, nickname, health records etc. Although it is >>>> likely to be safe to share some of that information openly, and even more >>>> within a controlled environment, publishers should bear in mind that data >>>> enrichment techniques may allow some elements to be discovered and linked >>>> from elsewhere. >>>> >>>> Notwithstanding that caution, data enrichment offers exciting >>>> possibilities for both data publishers and consumers. >>>> >>>> >>>> >>>> == ends== >>>> >>>> -- >>>> >>>> >>>> Phil Archer >>>> W3C Data Activity Lead >>>> http://www.w3.org/2013/data/ >>>> >>>> http://philarcher.org >>>> +44 (0)7887 767755 >>>> @philarcher1 >>>> >>> >>> >>> >> -- >> >> >> Phil Archer >> W3C Data Activity Lead >> http://www.w3.org/2013/data/ >> >> http://philarcher.org >> +44 (0)7887 767755 >> @philarcher1 >> > -- Phil Archer W3C Data Activity Lead http://www.w3.org/2013/data/ http://philarcher.org +44 (0)7887 767755 @philarcher1
Received on Tuesday, 10 May 2016 15:35:59 UTC