Re: Deidentification (ISSUE-188) from Roy T. Fielding on 2014-08-20 (public-tracking@w3.org from August 2014)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Wed, 20 Aug 2014 16:25:49 -0700
To: Justin Brookman <jbrookman@cdt.org>
Cc: David Singer <singer@apple.com>, "public-tracking@w3.org WG" <public-tracking@w3.org>, TOUBIANA Vincent <vtoubiana@cnil.fr>
Message-Id: <1E109A83-C202-4AA4-A365-A0E63FA1D3D4@gbiv.com>

On Aug 20, 2014, at 3:30 PM, Justin Brookman wrote:
> On Aug 18, 2014, at 12:12 PM, David Singer <singer@apple.com> wrote:
>> On Aug 17, 2014, at 6:35 , Mike O'Neill <michael.oneill@baycloud.com> wrote:
>> 
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>> 
>>>>> b) the deidentification measures should be described in a form that is at least as available as the data (i.e. publicly, if the data itself will be made public).
>>> 
>>> Why not publicly any every case? If someone collects DNT data and intends to share privately it amongst their friends we should know how they shred the PII out of it.
>> 
>> OK.  The term {permanently deidentified} below is a candidate for being replaced by a new name of our choosing (e.g. “permanent non-tracking data”), here and where it is used.  How is this?  I made the second clause not a note, as it contains ‘should’ and ‘strongly recommended’ i.e. it is not merely informative.
>> 
>> * * * * *
>> 
>> Data is {permanently de-identified} (and hence out of the scope of this specification) when a sufficient combination of technical measures and restrictions ensures that the data does not, and cannot and will not be used to, identify a particular user, user-agent, or device.
>> 
>> In the case of dataset that contain records that relate to a single user or a small number of users:
>> a) Usage and/or distribution restrictions are strongly recommended; experience has shown that such records can, in fact, sometimes be used to identify the user(s) despite the technical measures that were taken to prevent that happening.
>> b) the deidentification measures should be described publicly (e.g. in the privacy policy).
> 
> OK, we agreed to present this as an option at the Call for Objection today.  I was not following IRC that closely today, but Nick indicated that Roy or Vincent may be satisfied with this option as well, or might be willing to withdraw their proposals as well.  Roy, Vincent, let me know what you want to do, and we can proceed to a CfO on this issue.

I don't know what to make of that.  Behavioral requirements do not belong
in definitions unless they have the effect of partitioning a set of subjects
as being in or out of the definition.  A valid definition cannot have
any false negatives, which is what you get when data is de-identified but
the behavioral requirements are not met.

I do not believe that industry will describe their de-identification
measures publicly; certainly not in a privacy policy.  There are just too
many ways that a legal document like the privacy policy can get out of sync,
since it requires a great deal of corporate review.  What the policy does
is define the black box requirements, and then the technical folks are
instructed to adhere to those requirements at a minimum.  The actual
technical procedures implemented in practice are often more
privacy-preserving than what is publicly declared in a policy and
vary depending on which application is being discussed.

Furthermore, we are not talking about public data.  The fact that many
lawyers would prefer to have more transparency into corporate business
practices is hardly a justification for additional requirements.
What matters is the end result, not how a company might get there.

Regardless, nothing in the spec prevents companies from describing
their de-identification measures in a privacy policy. If there is
value for them to do so (as there is for EFF), then that value should
be justification enough without further imposition by this WG.
If legislative or regulatory bodies want to impose that kind of obligation,
they have the power to do so (usually subject to more responsible
oversight and public feedback than a W3C spec).

In general, I would prefer to switch to "anonymized" (and use a
strict definition of that) or return to using "unlinkable" (also
with a strict definition), rather than pollute the spec with behavioral
requirements inside the definition of terms.

....Roy

Received on Wednesday, 20 August 2014 23:26:13 UTC