Re: ACTION-371: text defining de-identified data from Dan Auerbach on 2013-03-13 (public-tracking@w3.org from March 2013)

From: Dan Auerbach <dan@eff.org>
Date: Wed, 13 Mar 2013 11:01:59 -0700
To: public-tracking@w3.org
Message-ID: <5140BF17.5070403@eff.org>
I also agree that we should just stick with de-identified, just as a
point of nomenclature. For one, unlike what you propose below, Rob, the
FTC text actually defines unlinkability in terms of de-identification,
so I think it would be very confusing if we did the opposite here.

That said, we did NOT agree at the face-to-face that unlinkability was a
"step beyond de-identified"; we are not at all weakening the standard
with our word choice. For unlinkability and de-identification both, we
do NOT propose a holy grail of provably perfect anonymization that can't
be achieved in practice (or even in theory, really!). However, for both
we require a significantly higher standard than, for example, keeping a
pseudonymous data set of browsing history. The first non-normative
example is intended to make this clear, but I can flesh it out if it's not.

Dan

On 03/13/2013 10:28 AM, Shane Wiley wrote:
>
> Ed,
>
>
> Agreed -- reasonably attempting to clear unique identifiers or
> information that could lead to unique identification in URLs should
> also be included.
>
>  
>
> - Shane
>
>  
>
> *From:*Edward W. Felten [mailto:felten@CS.Princeton.EDU]
> *Sent:* Wednesday, March 13, 2013 10:22 AM
> *To:* Justin Brookman
> *Cc:* <public-tracking@w3.org>
> *Subject:* Re: ACTION-371: text defining de-identified data
>
>  
>
> But we should be equally clear that "de-identify" means more than just
> removing the most obvious identifiers from the data.
>
>  
>
> On Wed, Mar 13, 2013 at 1:07 PM, Justin Brookman <justin@cdt.org
> <mailto:justin@cdt.org>> wrote:
>
> Shane is right that we did choose to use "deidentified" instead of
> "unlinkable" at the Cambridge meeting.  So I agree we probably should
> not use "unlinkable" to define "deidentified" in the standard.
>  However, I don't see why we need to define "unlinkable" at all, as it
> has no operational meaning, and was rejected because it implied a
> technological impossibility of relinking, which is not a standard that
> can be reasonably achieved.
>
> Justin Brookman
> Director, Consumer Privacy
> Center for Democracy & Technology
> tel 202.407.8812 <tel:202.407.8812>
> justin@cdt.org <mailto:justin@cdt.org>
> http://www.cdt.org
> @JustinBrookman
> @CenDemTech
>
>
>
> On 3/13/2013 11:35 AM, Shane Wiley wrote:
>
> Rob,
>
> So we're agreed unlinkability requires more processing than
> de-identified - good.  I would recommend we define de-identified
> (nearly done) and unlinkability separately to clearly demonstrate they
> are different points within a continuum.  We can then focus on the
> discussion of retention of data in its de-identified state prior to
> moving to the ultimate unlinkable state.
>
> - Shane
>
> -----Original Message-----
> From: Rob van Eijk [mailto:rob@blaeu.com <mailto:rob@blaeu.com>]
> Sent: Wednesday, March 13, 2013 8:28 AM
> To: Shane Wiley
> Cc: public-tracking@w3.org <mailto:public-tracking@w3.org>
> Subject: RE: ACTION-371: text defining de-identified data
>
> Hi Shane,
>
> I hear you and understand your position. But unlinkable and
> de-identified are not mutual exclusive. Unlinkable data is a subset of
> de-identified data, they just go through another step of scrubbing).
> Adding it to the list is not hurting your position.
>
> The key towards the middle ground remains data retention, which has to
> be proportionate to the purpose.
>
> Rob
>
> Shane Wiley schreef op 2013-03-13 16:13:
>
> Rob,
>
> I thought we had agreed to not mix the "unlinkable" term with
> "de-identified" here.  In our discussions in Boston it appeared there
> was general agreement that unlinkability in a step beyond
> de-identified.  Once a record has been rendered de-identified, it can
> later further be made unlinkable (using your definition of unlinkable
> vs. the one I proposed).  This is a significant sticking point for
> those of use attempting to find middle-ground here so hopefully we can
> document the details in non-normative text but I'd ask that we remove
> mention of unlinkable in the definition of de-identified at this time
> (or else we've not really moved forward in this discussion in my
> opinion).
>
> - Shane
>
> -----Original Message-----
> From: Rob van Eijk [mailto:rob@blaeu.com <mailto:rob@blaeu.com>]
> Sent: Wednesday, March 13, 2013 5:57 AM
> To: public-tracking@w3.org <mailto:public-tracking@w3.org>
> Subject: RE: ACTION-371: text defining de-identified data
>
> Dan, Kevin,
>
> I would really want the unlinkability in there as well. I propose to
> add the text:  made unlinkable
>
> Normative text: Data can be considered sufficiently de-identified to
> the extent that it has been deleted, made unlinkable, modified,
> aggregated, anonymized or otherwise manipulated in order to achieve a
> reasonable level of justified confidence that the data cannot
> reasonably be used to infer information about, or otherwise be linked
> to, a particular user, user agent, computer or device.
>
>
> In terms of privacy by design, de-identification through unlinkability
> is the strongest form of de-identtification IMHO.
>
> Rob
>
> Kevin Kiley schreef op 2013-03-12 19:03:
>
> Dan,
>
> In case I wasn't being clear in my last post, I (personally) believe
> that
>
> User-agent should *NOT* be removed from the proposed text.
>
> I actually don't think it would do any harm to *ADD* the word
> 'Computer'
>
> as well ( which is present in the current FTC definition ) so it
> reads like this...
>
> Normative text:
>
> Data can be considered sufficiently de-identified to the extent that
> it
>
> has been deleted, modified, aggregated, anonymized or otherwise
>
> manipulated in order to achieve a reasonable level of justified
>
> confidence that the data cannot reasonably be used to infer
> information
>
> about, or otherwise be linked to, a particular user, user agent,
> computer or device.
>
> I think that covers it pretty well, and *NO* 'clarifying text' is
> necessary.
>
> Just my 2 cents.
>
> Kevin Kiley
>
> Previous message(s)...
>
> Dan,
>
> Perhaps you can add text clarifying this perspective or, much like
> the FTC, suffice with "device" which I believe more than covers what
> you're looking for here.
>
> - Shane
>
> From: Dan Auerbach [mailto:dan@eff.org <mailto:dan@eff.org>]
>
> Sent: Tuesday, March 12, 2013 8:57 AM
>
> To: public-tracking@w3.org <mailto:public-tracking@w3.org>
>
> Subject: Re: ACTION-371: text defining de-identified data
>
> Shane and Kevin -- The phrase "user agent" in the text is intended to
> refer to a particular user agent (not "Chrome 26" but rather "the
> browser running on Dan's laptop". I hoped that would be clear from
> context, but if it's not we can clarify. I may not be able to
> identify your device per se, but can identify that this is the same
> browser as I saw before. I think this is the case with using cookies,
> for example. It seems more accurate to me than lumping it all under
> "device", and appropriate since the text of our document is elsewhere
> focused on user agents, unlike the FTC text.
>
> Best,
>
> Dan
>
> On 03/12/2013 12:19 AM, Kevin Kiley wrote:
>
>     Shane Wiley wrote...
>     I had removed "user agent" in the suggested edit as this could be
>     something as generic as "Chrome 26".
>
> It can also be something VERY specific... and tell you a LOT about
> the Computer/OS/Device being used.
>
> In the case of Mobile... it will pretty much tell you EXACTLY what
> 'Device' is being used.
>
>     The FTC likewise does not use "user agent" in their definition.
>
> That's true... but BOTH definitions (W3C and FTC) currently mention
> 'Device'... and the FTC
>
> reports go to great lengths about how important it is to exclude any
> knowledge of 'the Device'
>
> from the de-identified data ( especially in the case of 'Mobile
> Devices' ).
>
> Kevin Kiley
>
>
>
>
>
>  
>
> -- 
> Edward W. Felten
> Professor of Computer Science and Public Affairs
> Director, Center for Information Technology Policy
> Princeton University                
> 609-258-5906           http://www.cs.princeton.edu/~felten
> <http://www.cs.princeton.edu/%7Efelten>
>


-- 
Dan Auerbach
Staff Technologist
Electronic Frontier Foundation
dan@eff.org
415 436 9333 x134
Received on Wednesday, 13 March 2013 18:02:35 UTC