W3C home > Mailing lists > Public > public-tracking@w3.org > August 2014

Re: Deidentification (ISSUE-188)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Thu, 28 Aug 2014 16:41:38 -0700
Message-Id: <DB444EFC-FB6D-4C58-B179-7C3DA5D6524B@gbiv.com>
Cc: David Singer <singer@apple.com>, "<vtoubiana@cnil.fr>" <vtoubiana@cnil.fr>, "<rob@blaeu.com>" <rob@blaeu.com>, "<public-tracking@w3.org>" <public-tracking@w3.org>
To: Mike O'Neill <michael.oneill@baycloud.com>
We don't seem to be talking about the same problem space. De-identification is something that usually occurs offline, long after the user agent has come and gone. We can't assume the user agent will ever visit the same domain again.

If we were talking about the time of the request, then it is simpler to just avoid setting an identifier of any kind. If the identifier is actually necessary for communication or state, then we are unlikely to have a state where it is deleted (expired, maybe, but even that is inconsistently implemented by UAs).

In any case, both would be covered by the general advice I described.


> On Aug 28, 2014, at 1:01 PM, "Mike O'Neill" <michael.oneill@baycloud.com> wrote:
> Hash: SHA1
> Not for cookies or the cache. A set-cookies header with an expiry in the past (or clearing the ETag value if in the cache) would do it. Even for DOM storage you just need to send the JavaScript in the content (localStorage.Clear(); or localStorage.removeItem(UID);), you do not have to wait for a callback. This is how it is done now by loads of AdChoices or e-privacy compliant implementations.
>> -----Original Message-----
>> From: Roy T. Fielding [mailto:fielding@gbiv.com]
>> Sent: 28 August 2014 20:32
>> To: Mike O'Neill
>> Cc: David Singer; <vtoubiana@cnil.fr>; <rob@blaeu.com>; <public-
>> tracking@w3.org>
>> Subject: Re: Deidentification (ISSUE-188)
>>>> On Aug 28, 2014, at 2:55 AM, "Mike O'Neill" <michael.oneill@baycloud.com>
>>> wrote:
>>>>> Data is permanently de-identified when there exists a high level of
>> confidence that no human subject of the data can be identified, directly or
>> indirectly, by that data alone or in combination with  other retained or available
>> information.
>>> Roy, I think this is a good definition. Can we add the non-normative specific
>> example about UIDs below. The clause about a enabling communication and
>> requested service is there to cover publisher logins and session state (and IP
>> addresses).
>>> Non-normative example:
>>> In the interests of transparency this implies that any data used or stored in a
>> user agent or device for the purpose of identifying it in subsequent requests,
>> unless solely used to enable communication or to supply a service requested by
>> the user, will have been deleted or, if this is unfeasible, otherwise made
>> ineffective.
>> I don't think it implies anything of the sort, so this would be a normative
>> addition. It isn't a good idea to suggest that servers delete client-side storage,
>> since the only way to do that is pervasive callbacks with no privacy at all, and
>> there might well be identifiers that are still effective for other (still identified)
>> data sets. What matters is that they not be traceable to the de-identified data.
>> A better suggestion would be to take steps to ensure the de-identified data does
>> not contain any client-side identifier, nor data sufficient to generate a client-side
>> identifier, since that would likely remain an indirect identifier for the user.
>> ....Roy
> Version: GnuPG v1.4.13 (MingW32)
> Comment: Using gpg4o v3.3.26.5094 - http://www.gpg4o.com/
> Charset: utf-8
> pHjGrRAeh2BAzauLyZD2y/5RwxifYwTY8Y/0lqa/S7TGczIoMllgkOhGx1iI5S/I
> YN6jKFUmSgb6pq2BL5CsIACelDtvr64S6B4393C8fXTuPydXNYf7a83qwz3b+KyS
> SRrEXag2ljJ9Gc8ruCrbfL56HvQndG3C4m22rdQwAKcdo6sAVUBM2EOvNMy8t2mm
> vEiaJpYd7Bq9Gj+q+/HKjNAIz7rYUiySb8qaXzWuMJPHKYda0Yfgx0kXu2B62Bj7
> YwHMrfJ4gELqAAtvns6tNCNv3SkvcEvwvPdGY4NEh6yysKnuj1qvNs7CQRuq1Cg=
> =W69e
Received on Thursday, 28 August 2014 23:42:04 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:40:12 UTC