Re: de-identification text for Wednesday's call

On Apr 2, 2013, at 1:21 AM, Dan Auerbach wrote:
> Normative text:
> Data can be considered sufficiently de-identified to the extent that it has been deleted, modified, aggregated, anonymized or otherwise manipulated in order to achieve a reasonable level of justified confidence that the data cannot reasonably be used to infer information about, or otherwise be linked to, a particular user, user agent, or device.

I still don't like that use of "infer information about" -- if I am
collecting data about screen resolution (a common requirement these
days) then I can infer information about each particular device.
The privacy concern is not about having that data -- it's about
having it with a level of specificity that allows identification
of an individual's actions as distinct from other individuals.
For example, by recording specific pixel density of rare devices
as opposed to broader (de-identified) categories of density.

In other words, what we want is data that "cannot be used to
identify a particular user, user agent, or device", or perhaps
"cannot be used to identify the actions of a particular user,
user agent, or device"

I see no reason to broaden that to "infer information about" or
add "otherwise be linked to"; the former makes keeping any data
pointless, since all retained data is information about a subject,
and the latter is already implied by "cannot be used to identify".

Likewise, it is reasonable to assume that if someone shows that
the data can be used to identify a particular user, user agent,
or device, then it is no longer considered de-identified.
So, we could shorten it to

Data can be considered sufficiently de-identified if there exists a reasonable level of confidence that the data cannot be used to identify a particular user, user agent, or device.

....Roy

Received on Wednesday, 3 April 2013 09:42:05 UTC