Re: ISSUE-198: Define new word for yellow state due to the fact that the process of de-identification spans all three states (red,yellow and green). from Roy T. Fielding on 2013-05-16 (public-tracking@w3.org from May 2013)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Wed, 15 May 2013 18:12:31 -0700
To: Shane Wiley <wileys@yahoo-inc.com>
Cc: "rob@blaeu.com" <rob@blaeu.com>, John Simpson <john@consumerwatchdog.org>, "Tracking Protection Working Group" <public-tracking@w3.org>
Message-Id: <2035953B-F59A-4107-AAEA-7F18DB4B05EB@gbiv.com>

I will agree and disagree with both of you ... Shane is right
in that de-identified is a more natural term for the yellow
state, but only in a world where the FTC (and this group)
doesn't oddly define de-identified as being a replacement term
for the more stringent (and confusing for hypertext) unlinkable.
Hence, Shane's reuse of de-identified in this context is wrong.

Likewise, whether the raw data is personally identifiable,
pseudonymous, or entirely anonymous is completely unknown,
since it depends both on what the client decides to send and
what the server decides to keep/log.  Thus, I agree with Rob
that the red state should not be labelled pseudonymous, but
I disagree that it would make any sense to label the yellow
state as pseudonymous.

The privacy state of the yellow data is actually stronger
than pseudonymous: an inspector with access to the browser's
client-side storage can easily map stored pseudonyms to the
data in the red state, but would would not be able to do the
same for data in the yellow state unless they also have the
ability to reproduce the pseudonym hash. [Note that I am 
assuming here that such a browser's history has been cleared,
since an inspector doesn't need yellow data if it already has
access to the browser's history.]

A "hashed pseudonymous" label would be more accurate for
the yellow state, assuming that anonymous data goes directly
from red to green.

However, IMO, Shane's diagrams are specific to data collection
for advertising and advertising support services.  If DNT
were limited to advertising, I'd be a happy camper, but
I somehow doubt that is the intended goal.

Finally, I want to express my discomfort with the way this
discussion is progressing.  As I've said numerous times,
I am willing to turn off (cross-site) tracking for users that
have indicated such as their legitimate preference.  That is
the entire scope of what I am willing to do under the auspices
of a standard produced by this group.  There are a lot of
other data sharing and retention requirements that are related
to privacy and subject to regulatory requirements, but those
concerns are outside the scope of this work.  They should apply
whether or not the user signals DNT:1, and thus do not need
to be addressed by our specifications.

Blanket requirements on DNT:1 retained data being in a
particular category of "anonymous", "pseudonymous", or
"unlinkable" would presume that implementations actively
eliminate data sent by the client that might correlate to
a single user, user agent, or device.  I am not able to
implement such a requirement in general, and will not
pretend to do so.  Marking the data record as DNT:1, for
the sake of restricting later processing, is easy;
deciding that the contents of a data record will not be
identifying when combined with past and future data sets
is impossible in general.  Even if we restricted such
processing to a specific data set, it only makes economic
sense to perform that processing when someone with a budget
decides they want to share it in a non-aggregate form.
At that point, we can and should de-identify the data to
be shared regardless of the user's DNT setting.

....Roy

Received on Thursday, 16 May 2013 01:12:55 UTC