Re: Deidentification (ISSUE-188) from Roy T. Fielding on 2014-07-23 (public-tracking@w3.org from July 2014)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Wed, 23 Jul 2014 09:19:33 -0700
To: Justin Brookman <jbrookman@cdt.org>
Cc: "public-tracking@w3.org List" <public-tracking@w3.org>, David Singer <singer@apple.com>
Message-Id: <C2C5115E-1615-4782-9274-9CB21A1495CF@gbiv.com>

On Jul 23, 2014, at 6:24 AM, Justin Brookman wrote:

> Different questions to Roy and David about their proposals:
> 
> Roy, on the call last week, you said that if data can be tied to a user agent or device, then it wasn’t deidentified.  Nick proposed adding “, user agent, or device” to the end of your definition to make that clear.  So it would read: 
> 
> A data set is considered de-identified when there exists a reasonable level of justified confidence that the data within it cannot be used to infer information about, or otherwise be linked to, a particular user, user agent, or device. 
> 
> However, from the minutes, at some point you rejected some amendment — not sure if it’s this one or not.

Yes, the problem is the other text around it.  I don't want

  "data within it cannot be used to infer information about ...
   a user agent, or device."

One of the main reasons for collecting data is to ensure that the
site/system works for a given UA/device.  That's data we need to keep, at least
for any UA which does not identify a user (i.e., is in use by enough distinct
users that we can retain that data without identifying them).  That's
why I did not include UA and device in the proposal.

Alternatively, I would be happy with:

  A data set is considered de-identified when there exists a reasonable
  level of justified confidence that none of the data within it can be
  linked to a particular user, user agent, or device.

....Roy

Received on Wednesday, 23 July 2014 16:19:47 UTC