W3C home > Mailing lists > Public > public-tracking@w3.org > March 2013

Re: ACTION-371: text defining de-identified data

From: Edward W. Felten <felten@CS.Princeton.EDU>
Date: Wed, 13 Mar 2013 13:21:41 -0400
Message-ID: <CANZBoGh_G4tc-Uc4dxZ4gKVVbng91G79eRjbhVu-PXrjde3yPw@mail.gmail.com>
To: Justin Brookman <justin@cdt.org>
Cc: "<public-tracking@w3.org>" <public-tracking@w3.org>
But we should be equally clear that "de-identify" means more than just
removing the most obvious identifiers from the data.


On Wed, Mar 13, 2013 at 1:07 PM, Justin Brookman <justin@cdt.org> wrote:

> Shane is right that we did choose to use "deidentified" instead of
> "unlinkable" at the Cambridge meeting.  So I agree we probably should not
> use "unlinkable" to define "deidentified" in the standard.  However, I
> don't see why we need to define "unlinkable" at all, as it has no
> operational meaning, and was rejected because it implied a technological
> impossibility of relinking, which is not a standard that can be reasonably
> achieved.
>
> Justin Brookman
> Director, Consumer Privacy
> Center for Democracy & Technology
> tel 202.407.8812
> justin@cdt.org
> http://www.cdt.org
> @JustinBrookman
> @CenDemTech
>
>
> On 3/13/2013 11:35 AM, Shane Wiley wrote:
>
>> Rob,
>>
>> So we're agreed unlinkability requires more processing than de-identified
>> - good.  I would recommend we define de-identified (nearly done) and
>> unlinkability separately to clearly demonstrate they are different points
>> within a continuum.  We can then focus on the discussion of retention of
>> data in its de-identified state prior to moving to the ultimate unlinkable
>> state.
>>
>> - Shane
>>
>> -----Original Message-----
>> From: Rob van Eijk [mailto:rob@blaeu.com]
>> Sent: Wednesday, March 13, 2013 8:28 AM
>> To: Shane Wiley
>> Cc: public-tracking@w3.org
>> Subject: RE: ACTION-371: text defining de-identified data
>>
>> Hi Shane,
>>
>> I hear you and understand your position. But unlinkable and de-identified
>> are not mutual exclusive. Unlinkable data is a subset of de-identified
>> data, they just go through another step of scrubbing).
>> Adding it to the list is not hurting your position.
>>
>> The key towards the middle ground remains data retention, which has to be
>> proportionate to the purpose.
>>
>> Rob
>>
>> Shane Wiley schreef op 2013-03-13 16:13:
>>
>>> Rob,
>>>
>>> I thought we had agreed to not mix the "unlinkable" term with
>>> "de-identified" here.  In our discussions in Boston it appeared there
>>> was general agreement that unlinkability in a step beyond
>>> de-identified.  Once a record has been rendered de-identified, it can
>>> later further be made unlinkable (using your definition of unlinkable
>>> vs. the one I proposed).  This is a significant sticking point for
>>> those of use attempting to find middle-ground here so hopefully we can
>>> document the details in non-normative text but I'd ask that we remove
>>> mention of unlinkable in the definition of de-identified at this time
>>> (or else we've not really moved forward in this discussion in my
>>> opinion).
>>>
>>> - Shane
>>>
>>> -----Original Message-----
>>> From: Rob van Eijk [mailto:rob@blaeu.com]
>>> Sent: Wednesday, March 13, 2013 5:57 AM
>>> To: public-tracking@w3.org
>>> Subject: RE: ACTION-371: text defining de-identified data
>>>
>>> Dan, Kevin,
>>>
>>> I would really want the unlinkability in there as well. I propose to
>>> add the text:  made unlinkable
>>>
>>> Normative text: Data can be considered sufficiently de-identified to
>>> the extent that it has been deleted, made unlinkable, modified,
>>> aggregated, anonymized or otherwise manipulated in order to achieve a
>>> reasonable level of justified confidence that the data cannot
>>> reasonably be used to infer information about, or otherwise be linked
>>> to, a particular user, user agent, computer or device.
>>>
>>>
>>> In terms of privacy by design, de-identification through unlinkability
>>> is the strongest form of de-identtification IMHO.
>>>
>>> Rob
>>>
>>> Kevin Kiley schreef op 2013-03-12 19:03:
>>>
>>>> Dan,
>>>>
>>>> In case I wasn't being clear in my last post, I (personally) believe
>>>> that
>>>>
>>>> User-agent should *NOT* be removed from the proposed text.
>>>>
>>>> I actually don't think it would do any harm to *ADD* the word
>>>> 'Computer'
>>>>
>>>> as well ( which is present in the current FTC definition ) so it
>>>> reads like this…
>>>>
>>>> Normative text:
>>>>
>>>> Data can be considered sufficiently de-identified to the extent that
>>>> it
>>>>
>>>> has been deleted, modified, aggregated, anonymized or otherwise
>>>>
>>>> manipulated in order to achieve a reasonable level of justified
>>>>
>>>> confidence that the data cannot reasonably be used to infer
>>>> information
>>>>
>>>> about, or otherwise be linked to, a particular user, user agent,
>>>> computer or device.
>>>>
>>>> I think that covers it pretty well, and *NO* 'clarifying text' is
>>>> necessary.
>>>>
>>>> Just my 2 cents.
>>>>
>>>> Kevin Kiley
>>>>
>>>> Previous message(s)…
>>>>
>>>> Dan,
>>>>
>>>> Perhaps you can add text clarifying this perspective or, much like
>>>> the FTC, suffice with "device" which I believe more than covers what
>>>> you're looking for here.
>>>>
>>>> - Shane
>>>>
>>>> From: Dan Auerbach [mailto:dan@eff.org]
>>>>
>>>> Sent: Tuesday, March 12, 2013 8:57 AM
>>>>
>>>> To: public-tracking@w3.org
>>>>
>>>> Subject: Re: ACTION-371: text defining de-identified data
>>>>
>>>> Shane and Kevin -- The phrase "user agent" in the text is intended to
>>>> refer to a particular user agent (not "Chrome 26" but rather "the
>>>> browser running on Dan's laptop". I hoped that would be clear from
>>>> context, but if it's not we can clarify. I may not be able to
>>>> identify your device per se, but can identify that this is the same
>>>> browser as I saw before. I think this is the case with using cookies,
>>>> for example. It seems more accurate to me than lumping it all under
>>>> "device", and appropriate since the text of our document is elsewhere
>>>> focused on user agents, unlike the FTC text.
>>>>
>>>> Best,
>>>>
>>>> Dan
>>>>
>>>> On 03/12/2013 12:19 AM, Kevin Kiley wrote:
>>>>
>>>>  Shane Wiley wrote...
>>>>>> I had removed "user agent" in the suggested edit as this could be
>>>>>> something as generic as "Chrome 26".
>>>>>>
>>>>> It can also be something VERY specific... and tell you a LOT about
>>>> the Computer/OS/Device being used.
>>>>
>>>> In the case of Mobile... it will pretty much tell you EXACTLY what
>>>> 'Device' is being used.
>>>>
>>>>  The FTC likewise does not use "user agent" in their definition.
>>>>>>
>>>>> That's true... but BOTH definitions (W3C and FTC) currently mention
>>>> 'Device'... and the FTC
>>>>
>>>> reports go to great lengths about how important it is to exclude any
>>>> knowledge of 'the Device'
>>>>
>>>> from the de-identified data ( especially in the case of 'Mobile
>>>> Devices' ).
>>>>
>>>> Kevin Kiley
>>>>
>>>
>
>
>


-- 
Edward W. Felten
Professor of Computer Science and Public Affairs
Director, Center for Information Technology Policy
Princeton University
609-258-5906           http://www.cs.princeton.edu/~felten
Received on Wednesday, 13 March 2013 17:22:34 UTC

This archive was generated by hypermail 2.3.1 : Friday, 3 November 2017 21:45:07 UTC