Close action-279? from Deirdre Lee on 2016-06-27 (public-dwbp-wg@w3.org from June 2016)

From: Deirdre Lee <deirdre@derilinx.com>
Date: Mon, 27 Jun 2016 19:00:42 +0100
To: public-dwbp-wg@w3.org, Phil Archer <phila@w3.org>, eric.kauz@gs1.org
Message-ID: <f7002efe-9eee-b2dc-ff1a-57b0ab36f0d3@derilinx.com>
Hi,

Suggest that the updates below mean we can close 
https://www.w3.org/2013/dwbp/track/actions/279?

Cheers,

Deirdre


On 10/05/2016 19:04, Annette Greiner wrote:
>
> +1 from me 2, BTW.
>
> -Annette
>
>
> On 5/10/16 8:35 AM, Phil Archer wrote:
>> Thanks Eric, I've made those changes.
>>
>> @Editors, these are included in my current pull request.
>>
>> Phil
>>
>> On 09/05/2016 14:22, Eric Stephan wrote:
>>> Phil and Annette,
>>>
>>> I agree with the placement of the sensitive data in the introduction 
>>> of the
>>> document.
>>>
>>> ~~~
>>>
>>> Could you change "Not all data" to "Not all data (and metadata)"?  
>>> If this
>>> were changed there would be no need for adding anything to the metadata
>>> section.
>>>
>>> ~~~
>>>
>>> Could you remove the statement:
>>>
>>> "It is for data publishers, not a technical standards working group, to
>>> determine policy on which data should be shared and under what
>>> circumstances. "
>>>
>>> To:
>>>
>>> "It is for data publishers to determine policy on which data should be
>>> shared and under what circumstances. "
>>>
>>> I understand the sentiment "not a technical standards working group" 
>>> in the
>>> context of the DWBP , the tone of the statement just sounds a bit to
>>> universal.  Some domain specific (health care) technical standards 
>>> for data
>>> sharing are constrained by policy and protocol.
>>>
>>> Thanks and great work,
>>>
>>> Eric S.
>>>
>>> On Mon, May 9, 2016 at 3:38 AM, Phil Archer <phila@w3.org> wrote:
>>>
>>>> Dear all,
>>>>
>>>> I have implemented a couple of minor changes to the BP doc that 
>>>> came up as
>>>> a result of our discussions on Friday.
>>>>
>>>> 1. Thanks Eric K for the suggested text that allows us to include 
>>>> direct
>>>> refs to the GS1 work which, IMHO, is well worth including. See 
>>>> versions of
>>>> your text in
>>>> http://philarcher1.github.io/dwbp/bp.html#identifiersWithinDatasets
>>>>
>>>> (For tracker, action-279)
>>>>
>>>> 2. Thanks Eric S for words on privacy - I think you'll agree that what
>>>> Annette has suggested covers exactly what you were talking about?
>>>>
>>>> (For tracker: action-278)
>>>>
>>>> 3. Annette, thanks for reviewing and improving the suggestions about
>>>> handling sensitive data - I agree entirely with your suggestions 
>>>> and, more
>>>> importantly, I am confident that they reflect the wider discussions 
>>>> we had
>>>> on Friday's call. Therefore I have put your improved text in my latest
>>>> version of the doc. See
>>>> http://philarcher1.github.io/dwbp/bp.html#intro
>>>> and
>>>> http://philarcher1.github.io/dwbp/bp.html#enrichment
>>>>
>>>> 4. Having done that, I have deleted the Sensitive Data section and 
>>>> moved
>>>> the Data Unavailability to within the Access section, just before 
>>>> the sub
>>>> section on APIs. I really wasn't sure where to put it but that 
>>>> seemed as
>>>> good as anywhere?
>>>>
>>>> 5. I've made that sub group on BPs on APIs into a section so it 
>>>> becomes
>>>> 8.10.1, complete with ID and all the rest of it.
>>>>
>>>> 6. I've updated the date of the doc to today's date.
>>>>
>>>> 7. Editors - I have issued a Pull Request for your consideration.
>>>>
>>>> HTH
>>>>
>>>> Phil.
>>>>
>>>>
>>>>
>>>> On 07/05/2016 21:05, Annette Greiner wrote:
>>>>
>>>>> Hi Phil,
>>>>> Thanks for letting me weigh in. I understand the connection you’re 
>>>>> making
>>>>> here, and I think it’s a good thing to mention in the enrichment 
>>>>> section.
>>>>> What I think is crucial but is not yet reflected in here is the 
>>>>> issue of
>>>>> privacy breach arising from putting together disparate data that 
>>>>> presents
>>>>> less risk separately. The second paragraph here is a good but more 
>>>>> general
>>>>> discussion of security and privacy issues that strikes me as not 
>>>>> belonging
>>>>> in this particular section. I would suggest instead addressing the 
>>>>> more
>>>>> general issues in the introduction to our document. Most of the third
>>>>> paragraph would also be better in the document introduction, but 
>>>>> the last
>>>>> sentence is relevant here. As I see it, the real issue with data 
>>>>> enrichment
>>>>> is combining datasets that each hold so little information about any
>>>>> individual that they cannot be identified but that together offer 
>>>>> enough
>>>>> information that they can be. I would suggest that here we just say,
>>>>>
>>>>> Data enrichment refers to a set of processes that can be used to 
>>>>> enhance,
>>>>>> refine or otherwise improve raw or previously processed data. 
>>>>>> This idea and
>>>>>> other similar concepts contribute to making data a valuable asset 
>>>>>> for
>>>>>> almost any modern business or enterprise. It is a diverse topic 
>>>>>> in itself,
>>>>>> details of which are beyond the scope of the current document. 
>>>>>> However, it
>>>>>> is worth noting that some of these techniques should be 
>>>>>> approached with
>>>>>> caution, as ethical concerns may arise. In scientific research, 
>>>>>> care must
>>>>>> be taken to avoid enrichment that distorts results or statistical 
>>>>>> outcomes.
>>>>>> For data about individuals, privacy issues may arise when combining
>>>>>> datasets. That is, enriching one dataset with another, when neither
>>>>>> contains sufficient information about any individual to identify 
>>>>>> them, may
>>>>>> yield a combined dataset that compromises privacy. Furthermore, 
>>>>>> these
>>>>>> techniqes can be carried out at scale, which in turn highlights 
>>>>>> the need
>>>>>> for caution.
>>>>>>
>>>>>
>>>>> Then, in the document introduction, I would suggest adding the 
>>>>> following,
>>>>> after the paragraph that begins “In this context…”.
>>>>>
>>>>> Not all data should be shared openly, however. Security, commercial
>>>>>> sensitivity and, above all, individuals' privacy need to be taken 
>>>>>> into
>>>>>> account. It is for data publishers, not a technical standards 
>>>>>> working
>>>>>> group, to determine policy on which data should be shared and 
>>>>>> under what
>>>>>> circumstances. Data sharing policies are likely to assess the 
>>>>>> exposure risk
>>>>>> and determine the appropriate security measures to be taken to 
>>>>>> protect
>>>>>> sensitive data, such as secure authentication and authorization.
>>>>>>
>>>>>> Depending on circumstances, sensitive information about individuals
>>>>>> might include full name, home address, email address, national
>>>>>> identification number, IP address, vehicle registration plate 
>>>>>> number,
>>>>>> driver's license number, face, fingerprints, or handwriting, 
>>>>>> credit card
>>>>>> numbers, digital identity, date of birth, birthplace, genetic 
>>>>>> information,
>>>>>> telephone number, login name, screen name, nickname, health 
>>>>>> records etc.
>>>>>> Although it is likely to be safe to share some of that 
>>>>>> information openly,
>>>>>> and even more within a controlled environment, publishers should 
>>>>>> bear in
>>>>>> mind that combining data from multiple sources may allow inadvertent
>>>>>> identification of individuals.
>>>>>>
>>>>>
>>>>> (I took out mention of https, as it will soon be everywhere, which 
>>>>> would
>>>>> make our doc out of date.)
>>>>>
>>>>> Also, I noticed a grammatical error in the implementation section 
>>>>> of BP
>>>>> 31. (Subject-verb agreement is off.) It should read "Techniques 
>>>>> for data
>>>>> enrichment are complex and go well beyond the scope of this 
>>>>> document, which
>>>>> can only highlight the possibilities."
>>>>> -Annette
>>>>>
>>>>> On May 6, 2016, at 7:50 AM, Phil Archer <phila@w3.org> wrote:
>>>>>>
>>>>>> Berna,
>>>>>>
>>>>>> As promised, I've copied the text from the sensitive data section 
>>>>>> and
>>>>>> merged some of it with the data enrichment intro to end up with 
>>>>>> this as a
>>>>>> suggestion.
>>>>>>
>>>>>> @Annette - we resolved to do this and move the BP about data
>>>>>> unavailability to the data access section. Do you agree with this?
>>>>>>
>>>>>> ===Begins==
>>>>>>
>>>>>> Data enrichment refers to a set of processes that can be used to
>>>>>> enhance, refine or otherwise improve raw or previously processed 
>>>>>> data. This
>>>>>> idea and other similar concepts contribute to making data a 
>>>>>> valuable asset
>>>>>> for almost any modern business or enterprise. It is a diverse 
>>>>>> topic in
>>>>>> itself, details of which are beyond the scope of the current 
>>>>>> document.
>>>>>> However, it is worth noting that techniques exist to carry out such
>>>>>> enrichment at scale which in turn highlights the need for caution.
>>>>>>
>>>>>> Not all data should be shared openly. Security, commercial 
>>>>>> sensitivity
>>>>>> and, above all, individuals' privacy need to be taken into 
>>>>>> account. It is
>>>>>> for data publishers, not a technical standards working group, to 
>>>>>> determine
>>>>>> policy on which data should be shared and under what 
>>>>>> circumstances. Data
>>>>>> sharing policies are likely to assess the exposure risk and 
>>>>>> determine the
>>>>>> appropriate security measures to be taken to protect sensitive 
>>>>>> data, such
>>>>>> as secure authentication and use of HTTPS.
>>>>>>
>>>>>> Depending on circumstance, sensitive information about 
>>>>>> individuals might
>>>>>> include: full name, home address, email address, national 
>>>>>> identification
>>>>>> number, IP address, vehicle registration plate number, driver's 
>>>>>> license
>>>>>> number, face, fingerprints, or handwriting, credit card numbers, 
>>>>>> digital
>>>>>> identity, date of birth, birthplace, genetic information, 
>>>>>> telephone number,
>>>>>> login name, screen name, nickname, health records etc. Although 
>>>>>> it is
>>>>>> likely to be safe to share some of that information openly, and 
>>>>>> even more
>>>>>> within a controlled environment, publishers should bear in mind 
>>>>>> that data
>>>>>> enrichment techniques may allow some elements to be discovered 
>>>>>> and linked
>>>>>> from elsewhere.
>>>>>>
>>>>>> Notwithstanding that caution, data enrichment offers exciting
>>>>>> possibilities for both data publishers and consumers.
>>>>>>
>>>>>>
>>>>>>
>>>>>> == ends==
>>>>>>
>>>>>> -- 
>>>>>>
>>>>>>
>>>>>> Phil Archer
>>>>>> W3C Data Activity Lead
>>>>>> http://www.w3.org/2013/data/
>>>>>>
>>>>>> http://philarcher.org
>>>>>> +44 (0)7887 767755
>>>>>> @philarcher1
>>>>>>
>>>>>
>>>>>
>>>>>
>>>> -- 
>>>>
>>>>
>>>> Phil Archer
>>>> W3C Data Activity Lead
>>>> http://www.w3.org/2013/data/
>>>>
>>>> http://philarcher.org
>>>> +44 (0)7887 767755
>>>> @philarcher1
>>>>
>>>
>>
>
> -- 
> Annette Greiner
> NERSC Data and Analytics Services
> Lawrence Berkeley National Laboratory
>

-- 
------------------------------------
Deirdre Lee, CEO & Founder
Derilinx - Linked & Open Data Solutions
  
Web:      www.derilinx.com
Email:    deirdre@derilinx.com
Address:  11/12 Baggot Court, Dublin 2, D02 F891
Tel:      +353 (0)1 254 4316
Mob:      +353 (0)87 417 2318
Linkedin: ie.linkedin.com/in/leedeirdre/
Twitter:  @deirdrelee
Received on Monday, 27 June 2016 18:01:18 UTC