W3C home > Mailing lists > Public > public-tracking@w3.org > March 2013

RE: ACTION-371: text defining de-identified data

From: Shane Wiley <wileys@yahoo-inc.com>
Date: Wed, 13 Mar 2013 17:28:45 +0000
To: "Edward W. Felten" <felten@CS.Princeton.EDU>, Justin Brookman <justin@cdt.org>
CC: "<public-tracking@w3.org>" <public-tracking@w3.org>
Message-ID: <DCCF036E573F0142BD90964789F720E3136634E5@GQ1-EX10-MB03.y.corp.yahoo.com>
Ed,

Agreed - reasonably attempting to clear unique identifiers or information that could lead to unique identification in URLs should also be included.

- Shane

From: Edward W. Felten [mailto:felten@CS.Princeton.EDU]
Sent: Wednesday, March 13, 2013 10:22 AM
To: Justin Brookman
Cc: <public-tracking@w3.org>
Subject: Re: ACTION-371: text defining de-identified data

But we should be equally clear that "de-identify" means more than just removing the most obvious identifiers from the data.

On Wed, Mar 13, 2013 at 1:07 PM, Justin Brookman <justin@cdt.org<mailto:justin@cdt.org>> wrote:
Shane is right that we did choose to use "deidentified" instead of "unlinkable" at the Cambridge meeting.  So I agree we probably should not use "unlinkable" to define "deidentified" in the standard.  However, I don't see why we need to define "unlinkable" at all, as it has no operational meaning, and was rejected because it implied a technological impossibility of relinking, which is not a standard that can be reasonably achieved.

Justin Brookman
Director, Consumer Privacy
Center for Democracy & Technology
tel 202.407.8812<tel:202.407.8812>
justin@cdt.org<mailto:justin@cdt.org>
http://www.cdt.org
@JustinBrookman
@CenDemTech


On 3/13/2013 11:35 AM, Shane Wiley wrote:
Rob,

So we're agreed unlinkability requires more processing than de-identified - good.  I would recommend we define de-identified (nearly done) and unlinkability separately to clearly demonstrate they are different points within a continuum.  We can then focus on the discussion of retention of data in its de-identified state prior to moving to the ultimate unlinkable state.

- Shane

-----Original Message-----
From: Rob van Eijk [mailto:rob@blaeu.com<mailto:rob@blaeu.com>]
Sent: Wednesday, March 13, 2013 8:28 AM
To: Shane Wiley
Cc: public-tracking@w3.org<mailto:public-tracking@w3.org>
Subject: RE: ACTION-371: text defining de-identified data

Hi Shane,

I hear you and understand your position. But unlinkable and de-identified are not mutual exclusive. Unlinkable data is a subset of de-identified data, they just go through another step of scrubbing).
Adding it to the list is not hurting your position.

The key towards the middle ground remains data retention, which has to be proportionate to the purpose.

Rob

Shane Wiley schreef op 2013-03-13 16:13:
Rob,

I thought we had agreed to not mix the "unlinkable" term with
"de-identified" here.  In our discussions in Boston it appeared there
was general agreement that unlinkability in a step beyond
de-identified.  Once a record has been rendered de-identified, it can
later further be made unlinkable (using your definition of unlinkable
vs. the one I proposed).  This is a significant sticking point for
those of use attempting to find middle-ground here so hopefully we can
document the details in non-normative text but I'd ask that we remove
mention of unlinkable in the definition of de-identified at this time
(or else we've not really moved forward in this discussion in my
opinion).

- Shane

-----Original Message-----
From: Rob van Eijk [mailto:rob@blaeu.com<mailto:rob@blaeu.com>]
Sent: Wednesday, March 13, 2013 5:57 AM
To: public-tracking@w3.org<mailto:public-tracking@w3.org>
Subject: RE: ACTION-371: text defining de-identified data

Dan, Kevin,

I would really want the unlinkability in there as well. I propose to
add the text:  made unlinkable

Normative text: Data can be considered sufficiently de-identified to
the extent that it has been deleted, made unlinkable, modified,
aggregated, anonymized or otherwise manipulated in order to achieve a
reasonable level of justified confidence that the data cannot
reasonably be used to infer information about, or otherwise be linked
to, a particular user, user agent, computer or device.


In terms of privacy by design, de-identification through unlinkability
is the strongest form of de-identtification IMHO.

Rob

Kevin Kiley schreef op 2013-03-12 19:03:
Dan,

In case I wasn't being clear in my last post, I (personally) believe
that

User-agent should *NOT* be removed from the proposed text.

I actually don't think it would do any harm to *ADD* the word
'Computer'

as well ( which is present in the current FTC definition ) so it
reads like this...

Normative text:

Data can be considered sufficiently de-identified to the extent that
it

has been deleted, modified, aggregated, anonymized or otherwise

manipulated in order to achieve a reasonable level of justified

confidence that the data cannot reasonably be used to infer
information

about, or otherwise be linked to, a particular user, user agent,
computer or device.

I think that covers it pretty well, and *NO* 'clarifying text' is
necessary.

Just my 2 cents.

Kevin Kiley

Previous message(s)...

Dan,

Perhaps you can add text clarifying this perspective or, much like
the FTC, suffice with "device" which I believe more than covers what
you're looking for here.

- Shane

From: Dan Auerbach [mailto:dan@eff.org<mailto:dan@eff.org>]

Sent: Tuesday, March 12, 2013 8:57 AM

To: public-tracking@w3.org<mailto:public-tracking@w3.org>

Subject: Re: ACTION-371: text defining de-identified data

Shane and Kevin -- The phrase "user agent" in the text is intended to
refer to a particular user agent (not "Chrome 26" but rather "the
browser running on Dan's laptop". I hoped that would be clear from
context, but if it's not we can clarify. I may not be able to
identify your device per se, but can identify that this is the same
browser as I saw before. I think this is the case with using cookies,
for example. It seems more accurate to me than lumping it all under
"device", and appropriate since the text of our document is elsewhere
focused on user agents, unlike the FTC text.

Best,

Dan

On 03/12/2013 12:19 AM, Kevin Kiley wrote:
Shane Wiley wrote...
I had removed "user agent" in the suggested edit as this could be
something as generic as "Chrome 26".
It can also be something VERY specific... and tell you a LOT about
the Computer/OS/Device being used.

In the case of Mobile... it will pretty much tell you EXACTLY what
'Device' is being used.
The FTC likewise does not use "user agent" in their definition.
That's true... but BOTH definitions (W3C and FTC) currently mention
'Device'... and the FTC

reports go to great lengths about how important it is to exclude any
knowledge of 'the Device'

from the de-identified data ( especially in the case of 'Mobile
Devices' ).

Kevin Kiley





--
Edward W. Felten
Professor of Computer Science and Public Affairs
Director, Center for Information Technology Policy
Princeton University
609-258-5906           http://www.cs.princeton.edu/~felten
Received on Wednesday, 13 March 2013 17:29:51 UTC

This archive was generated by hypermail 2.3.1 : Friday, 3 November 2017 21:45:07 UTC