- From: Bernadette Farias Lóscio <bfl@cin.ufpe.br>
- Date: Tue, 10 May 2016 15:38:15 -0300
- To: Annette Greiner <amgreiner@lbl.gov>
- Cc: Phil Archer <phila@w3.org>, Eric Stephan <ericphb@gmail.com>, Public DWBP WG <public-dwbp-wg@w3.org>, Eric Kauz <eric.kauz@gs1.org>, Caroline Burle <cburle@nic.br>, Newton Calegari <newton@nic.br>
- Message-ID: <CANx1PzzDD8weFmzv9fHeyTQc5jdMwV2A3oKXyOMgd_3FvczCAQ@mail.gmail.com>
Hi Phil, Annette and Eric, Thanks a lot for the updates, feedback and suggestions! Today I'm travelling, but tomorrow I'm gonna be back to Recife and I can finish the remainder comments. Cheers, Berna 2016-05-10 15:04 GMT-03:00 Annette Greiner <amgreiner@lbl.gov>: > +1 from me 2, BTW. > > -Annette > > On 5/10/16 8:35 AM, Phil Archer wrote: > > Thanks Eric, I've made those changes. > > @Editors, these are included in my current pull request. > > Phil > > On 09/05/2016 14:22, Eric Stephan wrote: > > Phil and Annette, > > I agree with the placement of the sensitive data in the introduction of > the > document. > > ~~~ > > Could you change "Not all data" to "Not all data (and metadata)"? If this > were changed there would be no need for adding anything to the metadata > section. > > ~~~ > > Could you remove the statement: > > "It is for data publishers, not a technical standards working group, to > determine policy on which data should be shared and under what > circumstances. " > > To: > > "It is for data publishers to determine policy on which data should be > shared and under what circumstances. " > > I understand the sentiment "not a technical standards working group" in > the > context of the DWBP , the tone of the statement just sounds a bit to > universal. Some domain specific (health care) technical standards for > data > sharing are constrained by policy and protocol. > > Thanks and great work, > > Eric S. > > On Mon, May 9, 2016 at 3:38 AM, Phil Archer <phila@w3.org> <phila@w3.org> > wrote: > > Dear all, > > I have implemented a couple of minor changes to the BP doc that came up as > a result of our discussions on Friday. > > 1. Thanks Eric K for the suggested text that allows us to include direct > refs to the GS1 work which, IMHO, is well worth including. See versions of > your text in > http://philarcher1.github.io/dwbp/bp.html#identifiersWithinDatasets > > (For tracker, action-279) > > 2. Thanks Eric S for words on privacy - I think you'll agree that what > Annette has suggested covers exactly what you were talking about? > > (For tracker: action-278) > > 3. Annette, thanks for reviewing and improving the suggestions about > handling sensitive data - I agree entirely with your suggestions and, more > importantly, I am confident that they reflect the wider discussions we had > on Friday's call. Therefore I have put your improved text in my latest > version of the doc. See > http://philarcher1.github.io/dwbp/bp.html#intro > and > http://philarcher1.github.io/dwbp/bp.html#enrichment > > 4. Having done that, I have deleted the Sensitive Data section and moved > the Data Unavailability to within the Access section, just before the sub > section on APIs. I really wasn't sure where to put it but that seemed as > good as anywhere? > > 5. I've made that sub group on BPs on APIs into a section so it becomes > 8.10.1, complete with ID and all the rest of it. > > 6. I've updated the date of the doc to today's date. > > 7. Editors - I have issued a Pull Request for your consideration. > > HTH > > Phil. > > > > On 07/05/2016 21:05, Annette Greiner wrote: > > Hi Phil, > Thanks for letting me weigh in. I understand the connection you’re making > here, and I think it’s a good thing to mention in the enrichment section. > What I think is crucial but is not yet reflected in here is the issue of > privacy breach arising from putting together disparate data that presents > less risk separately. The second paragraph here is a good but more general > discussion of security and privacy issues that strikes me as not belonging > in this particular section. I would suggest instead addressing the more > general issues in the introduction to our document. Most of the third > paragraph would also be better in the document introduction, but the last > sentence is relevant here. As I see it, the real issue with data > enrichment > is combining datasets that each hold so little information about any > individual that they cannot be identified but that together offer enough > information that they can be. I would suggest that here we just say, > > Data enrichment refers to a set of processes that can be used to enhance, > > refine or otherwise improve raw or previously processed data. This idea > and > other similar concepts contribute to making data a valuable asset for > almost any modern business or enterprise. It is a diverse topic in itself, > details of which are beyond the scope of the current document. However, it > is worth noting that some of these techniques should be approached with > caution, as ethical concerns may arise. In scientific research, care must > be taken to avoid enrichment that distorts results or statistical > outcomes. > For data about individuals, privacy issues may arise when combining > datasets. That is, enriching one dataset with another, when neither > contains sufficient information about any individual to identify them, may > yield a combined dataset that compromises privacy. Furthermore, these > techniqes can be carried out at scale, which in turn highlights the need > for caution. > > > Then, in the document introduction, I would suggest adding the following, > after the paragraph that begins “In this context…”. > > Not all data should be shared openly, however. Security, commercial > > sensitivity and, above all, individuals' privacy need to be taken into > account. It is for data publishers, not a technical standards working > group, to determine policy on which data should be shared and under what > circumstances. Data sharing policies are likely to assess the exposure > risk > and determine the appropriate security measures to be taken to protect > sensitive data, such as secure authentication and authorization. > > Depending on circumstances, sensitive information about individuals > might include full name, home address, email address, national > identification number, IP address, vehicle registration plate number, > driver's license number, face, fingerprints, or handwriting, credit card > numbers, digital identity, date of birth, birthplace, genetic information, > telephone number, login name, screen name, nickname, health records etc. > Although it is likely to be safe to share some of that information openly, > and even more within a controlled environment, publishers should bear in > mind that combining data from multiple sources may allow inadvertent > identification of individuals. > > > (I took out mention of https, as it will soon be everywhere, which would > make our doc out of date.) > > Also, I noticed a grammatical error in the implementation section of BP > 31. (Subject-verb agreement is off.) It should read "Techniques for data > enrichment are complex and go well beyond the scope of this document, > which > can only highlight the possibilities." > -Annette > > On May 6, 2016, at 7:50 AM, Phil Archer <phila@w3.org> <phila@w3.org> > wrote: > > > Berna, > > As promised, I've copied the text from the sensitive data section and > merged some of it with the data enrichment intro to end up with this as a > suggestion. > > @Annette - we resolved to do this and move the BP about data > unavailability to the data access section. Do you agree with this? > > ===Begins== > > Data enrichment refers to a set of processes that can be used to > enhance, refine or otherwise improve raw or previously processed data. > This > idea and other similar concepts contribute to making data a valuable asset > for almost any modern business or enterprise. It is a diverse topic in > itself, details of which are beyond the scope of the current document. > However, it is worth noting that techniques exist to carry out such > enrichment at scale which in turn highlights the need for caution. > > Not all data should be shared openly. Security, commercial sensitivity > and, above all, individuals' privacy need to be taken into account. It is > for data publishers, not a technical standards working group, to determine > policy on which data should be shared and under what circumstances. Data > sharing policies are likely to assess the exposure risk and determine the > appropriate security measures to be taken to protect sensitive data, such > as secure authentication and use of HTTPS. > > Depending on circumstance, sensitive information about individuals might > include: full name, home address, email address, national identification > number, IP address, vehicle registration plate number, driver's license > number, face, fingerprints, or handwriting, credit card numbers, digital > identity, date of birth, birthplace, genetic information, telephone > number, > login name, screen name, nickname, health records etc. Although it is > likely to be safe to share some of that information openly, and even more > within a controlled environment, publishers should bear in mind that data > enrichment techniques may allow some elements to be discovered and linked > from elsewhere. > > Notwithstanding that caution, data enrichment offers exciting > possibilities for both data publishers and consumers. > > > > == ends== > > -- > > > Phil Archer > W3C Data Activity Lead > http://www.w3.org/2013/data/ > > http://philarcher.org > +44 (0)7887 767755 > @philarcher1 > > > > > -- > > > Phil Archer > W3C Data Activity Lead > http://www.w3.org/2013/data/ > > http://philarcher.org > +44 (0)7887 767755 > @philarcher1 > > > > > -- > Annette Greiner > NERSC Data and Analytics Services > Lawrence Berkeley National Laboratory > > > -- Bernadette Farias Lóscio Centro de Informática Universidade Federal de Pernambuco - UFPE, Brazil ----------------------------------------------------------------------------
Received on Tuesday, 10 May 2016 18:46:41 UTC