Re: Close action-279? from Phil Archer on 2016-06-27 (public-dwbp-wg@w3.org from June 2016)

From: Phil Archer <phila@w3.org>
Date: Mon, 27 Jun 2016 19:14:09 +0100
To: Deirdre Lee <deirdre@derilinx.com>, public-dwbp-wg@w3.org, eric.kauz@gs1.org
Message-ID: <3006afce-af55-6546-2f85-a1f0cd2dc5bf@w3.org>
Yep. Done

On 27/06/2016 19:00, Deirdre Lee wrote:
> Hi,
>
> Suggest that the updates below mean we can close
> https://www.w3.org/2013/dwbp/track/actions/279?
>
> Cheers,
>
> Deirdre
>
>
> On 10/05/2016 19:04, Annette Greiner wrote:
>>
>> +1 from me 2, BTW.
>>
>> -Annette
>>
>>
>> On 5/10/16 8:35 AM, Phil Archer wrote:
>>> Thanks Eric, I've made those changes.
>>>
>>> @Editors, these are included in my current pull request.
>>>
>>> Phil
>>>
>>> On 09/05/2016 14:22, Eric Stephan wrote:
>>>> Phil and Annette,
>>>>
>>>> I agree with the placement of the sensitive data in the introduction
>>>> of the
>>>> document.
>>>>
>>>> ~~~
>>>>
>>>> Could you change "Not all data" to "Not all data (and metadata)"?
>>>> If this
>>>> were changed there would be no need for adding anything to the metadata
>>>> section.
>>>>
>>>> ~~~
>>>>
>>>> Could you remove the statement:
>>>>
>>>> "It is for data publishers, not a technical standards working group, to
>>>> determine policy on which data should be shared and under what
>>>> circumstances. "
>>>>
>>>> To:
>>>>
>>>> "It is for data publishers to determine policy on which data should be
>>>> shared and under what circumstances. "
>>>>
>>>> I understand the sentiment "not a technical standards working group"
>>>> in the
>>>> context of the DWBP , the tone of the statement just sounds a bit to
>>>> universal.  Some domain specific (health care) technical standards
>>>> for data
>>>> sharing are constrained by policy and protocol.
>>>>
>>>> Thanks and great work,
>>>>
>>>> Eric S.
>>>>
>>>> On Mon, May 9, 2016 at 3:38 AM, Phil Archer <phila@w3.org> wrote:
>>>>
>>>>> Dear all,
>>>>>
>>>>> I have implemented a couple of minor changes to the BP doc that
>>>>> came up as
>>>>> a result of our discussions on Friday.
>>>>>
>>>>> 1. Thanks Eric K for the suggested text that allows us to include
>>>>> direct
>>>>> refs to the GS1 work which, IMHO, is well worth including. See
>>>>> versions of
>>>>> your text in
>>>>> http://philarcher1.github.io/dwbp/bp.html#identifiersWithinDatasets
>>>>>
>>>>> (For tracker, action-279)
>>>>>
>>>>> 2. Thanks Eric S for words on privacy - I think you'll agree that what
>>>>> Annette has suggested covers exactly what you were talking about?
>>>>>
>>>>> (For tracker: action-278)
>>>>>
>>>>> 3. Annette, thanks for reviewing and improving the suggestions about
>>>>> handling sensitive data - I agree entirely with your suggestions
>>>>> and, more
>>>>> importantly, I am confident that they reflect the wider discussions
>>>>> we had
>>>>> on Friday's call. Therefore I have put your improved text in my latest
>>>>> version of the doc. See
>>>>> http://philarcher1.github.io/dwbp/bp.html#intro
>>>>> and
>>>>> http://philarcher1.github.io/dwbp/bp.html#enrichment
>>>>>
>>>>> 4. Having done that, I have deleted the Sensitive Data section and
>>>>> moved
>>>>> the Data Unavailability to within the Access section, just before
>>>>> the sub
>>>>> section on APIs. I really wasn't sure where to put it but that
>>>>> seemed as
>>>>> good as anywhere?
>>>>>
>>>>> 5. I've made that sub group on BPs on APIs into a section so it
>>>>> becomes
>>>>> 8.10.1, complete with ID and all the rest of it.
>>>>>
>>>>> 6. I've updated the date of the doc to today's date.
>>>>>
>>>>> 7. Editors - I have issued a Pull Request for your consideration.
>>>>>
>>>>> HTH
>>>>>
>>>>> Phil.
>>>>>
>>>>>
>>>>>
>>>>> On 07/05/2016 21:05, Annette Greiner wrote:
>>>>>
>>>>>> Hi Phil,
>>>>>> Thanks for letting me weigh in. I understand the connection you’re
>>>>>> making
>>>>>> here, and I think it’s a good thing to mention in the enrichment
>>>>>> section.
>>>>>> What I think is crucial but is not yet reflected in here is the
>>>>>> issue of
>>>>>> privacy breach arising from putting together disparate data that
>>>>>> presents
>>>>>> less risk separately. The second paragraph here is a good but more
>>>>>> general
>>>>>> discussion of security and privacy issues that strikes me as not
>>>>>> belonging
>>>>>> in this particular section. I would suggest instead addressing the
>>>>>> more
>>>>>> general issues in the introduction to our document. Most of the third
>>>>>> paragraph would also be better in the document introduction, but
>>>>>> the last
>>>>>> sentence is relevant here. As I see it, the real issue with data
>>>>>> enrichment
>>>>>> is combining datasets that each hold so little information about any
>>>>>> individual that they cannot be identified but that together offer
>>>>>> enough
>>>>>> information that they can be. I would suggest that here we just say,
>>>>>>
>>>>>> Data enrichment refers to a set of processes that can be used to
>>>>>> enhance,
>>>>>>> refine or otherwise improve raw or previously processed data.
>>>>>>> This idea and
>>>>>>> other similar concepts contribute to making data a valuable asset
>>>>>>> for
>>>>>>> almost any modern business or enterprise. It is a diverse topic
>>>>>>> in itself,
>>>>>>> details of which are beyond the scope of the current document.
>>>>>>> However, it
>>>>>>> is worth noting that some of these techniques should be
>>>>>>> approached with
>>>>>>> caution, as ethical concerns may arise. In scientific research,
>>>>>>> care must
>>>>>>> be taken to avoid enrichment that distorts results or statistical
>>>>>>> outcomes.
>>>>>>> For data about individuals, privacy issues may arise when combining
>>>>>>> datasets. That is, enriching one dataset with another, when neither
>>>>>>> contains sufficient information about any individual to identify
>>>>>>> them, may
>>>>>>> yield a combined dataset that compromises privacy. Furthermore,
>>>>>>> these
>>>>>>> techniqes can be carried out at scale, which in turn highlights
>>>>>>> the need
>>>>>>> for caution.
>>>>>>>
>>>>>>
>>>>>> Then, in the document introduction, I would suggest adding the
>>>>>> following,
>>>>>> after the paragraph that begins “In this context…”.
>>>>>>
>>>>>> Not all data should be shared openly, however. Security, commercial
>>>>>>> sensitivity and, above all, individuals' privacy need to be taken
>>>>>>> into
>>>>>>> account. It is for data publishers, not a technical standards
>>>>>>> working
>>>>>>> group, to determine policy on which data should be shared and
>>>>>>> under what
>>>>>>> circumstances. Data sharing policies are likely to assess the
>>>>>>> exposure risk
>>>>>>> and determine the appropriate security measures to be taken to
>>>>>>> protect
>>>>>>> sensitive data, such as secure authentication and authorization.
>>>>>>>
>>>>>>> Depending on circumstances, sensitive information about individuals
>>>>>>> might include full name, home address, email address, national
>>>>>>> identification number, IP address, vehicle registration plate
>>>>>>> number,
>>>>>>> driver's license number, face, fingerprints, or handwriting,
>>>>>>> credit card
>>>>>>> numbers, digital identity, date of birth, birthplace, genetic
>>>>>>> information,
>>>>>>> telephone number, login name, screen name, nickname, health
>>>>>>> records etc.
>>>>>>> Although it is likely to be safe to share some of that
>>>>>>> information openly,
>>>>>>> and even more within a controlled environment, publishers should
>>>>>>> bear in
>>>>>>> mind that combining data from multiple sources may allow inadvertent
>>>>>>> identification of individuals.
>>>>>>>
>>>>>>
>>>>>> (I took out mention of https, as it will soon be everywhere, which
>>>>>> would
>>>>>> make our doc out of date.)
>>>>>>
>>>>>> Also, I noticed a grammatical error in the implementation section
>>>>>> of BP
>>>>>> 31. (Subject-verb agreement is off.) It should read "Techniques
>>>>>> for data
>>>>>> enrichment are complex and go well beyond the scope of this
>>>>>> document, which
>>>>>> can only highlight the possibilities."
>>>>>> -Annette
>>>>>>
>>>>>> On May 6, 2016, at 7:50 AM, Phil Archer <phila@w3.org> wrote:
>>>>>>>
>>>>>>> Berna,
>>>>>>>
>>>>>>> As promised, I've copied the text from the sensitive data section
>>>>>>> and
>>>>>>> merged some of it with the data enrichment intro to end up with
>>>>>>> this as a
>>>>>>> suggestion.
>>>>>>>
>>>>>>> @Annette - we resolved to do this and move the BP about data
>>>>>>> unavailability to the data access section. Do you agree with this?
>>>>>>>
>>>>>>> ===Begins==
>>>>>>>
>>>>>>> Data enrichment refers to a set of processes that can be used to
>>>>>>> enhance, refine or otherwise improve raw or previously processed
>>>>>>> data. This
>>>>>>> idea and other similar concepts contribute to making data a
>>>>>>> valuable asset
>>>>>>> for almost any modern business or enterprise. It is a diverse
>>>>>>> topic in
>>>>>>> itself, details of which are beyond the scope of the current
>>>>>>> document.
>>>>>>> However, it is worth noting that techniques exist to carry out such
>>>>>>> enrichment at scale which in turn highlights the need for caution.
>>>>>>>
>>>>>>> Not all data should be shared openly. Security, commercial
>>>>>>> sensitivity
>>>>>>> and, above all, individuals' privacy need to be taken into
>>>>>>> account. It is
>>>>>>> for data publishers, not a technical standards working group, to
>>>>>>> determine
>>>>>>> policy on which data should be shared and under what
>>>>>>> circumstances. Data
>>>>>>> sharing policies are likely to assess the exposure risk and
>>>>>>> determine the
>>>>>>> appropriate security measures to be taken to protect sensitive
>>>>>>> data, such
>>>>>>> as secure authentication and use of HTTPS.
>>>>>>>
>>>>>>> Depending on circumstance, sensitive information about
>>>>>>> individuals might
>>>>>>> include: full name, home address, email address, national
>>>>>>> identification
>>>>>>> number, IP address, vehicle registration plate number, driver's
>>>>>>> license
>>>>>>> number, face, fingerprints, or handwriting, credit card numbers,
>>>>>>> digital
>>>>>>> identity, date of birth, birthplace, genetic information,
>>>>>>> telephone number,
>>>>>>> login name, screen name, nickname, health records etc. Although
>>>>>>> it is
>>>>>>> likely to be safe to share some of that information openly, and
>>>>>>> even more
>>>>>>> within a controlled environment, publishers should bear in mind
>>>>>>> that data
>>>>>>> enrichment techniques may allow some elements to be discovered
>>>>>>> and linked
>>>>>>> from elsewhere.
>>>>>>>
>>>>>>> Notwithstanding that caution, data enrichment offers exciting
>>>>>>> possibilities for both data publishers and consumers.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> == ends==
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>>
>>>>>>> Phil Archer
>>>>>>> W3C Data Activity Lead
>>>>>>> http://www.w3.org/2013/data/
>>>>>>>
>>>>>>> http://philarcher.org
>>>>>>> +44 (0)7887 767755
>>>>>>> @philarcher1
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> --
>>>>>
>>>>>
>>>>> Phil Archer
>>>>> W3C Data Activity Lead
>>>>> http://www.w3.org/2013/data/
>>>>>
>>>>> http://philarcher.org
>>>>> +44 (0)7887 767755
>>>>> @philarcher1
>>>>>
>>>>
>>>
>>
>> --
>> Annette Greiner
>> NERSC Data and Analytics Services
>> Lawrence Berkeley National Laboratory
>>
>

-- 


Phil Archer
W3C Data Activity Lead
http://www.w3.org/2013/data/

http://philarcher.org
+44 (0)7887 767755
@philarcher1
Received on Monday, 27 June 2016 18:13:30 UTC