W3C home > Mailing lists > Public > public-tracking@w3.org > August 2014

RE: Deidentification (ISSUE-188)

From: Mike O'Neill <michael.oneill@baycloud.com>
Date: Fri, 15 Aug 2014 17:07:07 +0100
To: "'David Singer'" <singer@apple.com>
Cc: <rob@blaeu.com>, "'Justin Brookman'" <jbrookman@cdt.org>, <public-tracking@w3.org>
Message-ID: <01ae01cfb8a2$f7836570$e68a3050$@baycloud.com>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

As I said, I do not think the old definition of de-identified works for the third-party compliance section (or any statement describing data as out-of-scope of DNT). It assumes that identifying (tracking) data has been collected and some process other than deletion can be applied to it to make it safe.

I suggested we use a new definition for out-of-scope e.g. anonymous data (mathematically impossible to derive identity from it, or being linked to an individual in a subsequent network interaction), and leaving the definition of the de-identifying process for the permitted use section (data collected unknowingly in error should just be deleted). 

I agree your "data does not, and cannot and will not " implies impossibility, and the dreaded "reasonable" has gone which is good. Though the non-normative bit counteracts that somewhat by calling for distribution restrictions (which are not needed if the data "cannot" be re-identified).

I agree with Rob that a new definition would probably be superfluous given our definition of tracking implying in-scope data as : "..  data regarding a particular user's activity across multiple distinct contexts".

The problem I have is that with the other-contexts qualification machine discoverability becomes tricky.  This could create a loophole if collected data with a UID is out-of-scope  when the controller promises to wear tunnel-vision glasses.

Does anyone have ideas how to address that?
 
Mike

> -----Original Message-----
> From: David Singer [mailto:singer@apple.com]
> Sent: 14 August 2014 21:05
> To: Mike O'Neill
> Cc: rob@blaeu.com; Justin Brookman; public-tracking@w3.org
> Subject: Re: Deidentification (ISSUE-188)
> 
> 
> On Aug 14, 2014, at 12:20 , Mike O'Neill <michael.oneill@baycloud.com> wrote:
> 
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > I agree, using a verb assumes that you already have data about people and you
> apply a de-identifying process to it. It is the process that is hard to define,
> without leaving loopholes.
> 
> Precisely.  I am not trying to define a process; I am defining by the result.  The
> result is a set of data that will never be linked to any specific user, user-agent, or
> device, by a suitable combination of technical (deidentification) measures, and
> restrictions (e.g. you are not allowed to try, you are not allowed to distribute
> this to anyone unless they agree not to try, and so on).  I don’t want to define
> the technical processes or the restrictions, just the result;  the data never gets
> linked to a specific user, user-agent, or device, ever again.
> 
> >
> > What is in scope is tracking data, and DNT should just mean do not collect it
> (unless you claim a permitted use). If you have collected in error just delete it.
> >
> > Maybe that is all we need to say.
> 
> Maybe you need to review where we use the term again.
> 
> 1.  Third parties can collect data if
>   a) they have an exception
>   b) they have a permitted use
>   c) it’s deidentified
> 2.  Unknowing collection.  You have to render the data out of scope or delete it,
> and out of scope means you have permanently deidentified it.
> 3. Discussed but not yet in the spec.: the ‘raw data’ problem (companies cannot
> process raw logs in real time). Keep the raw data until you can process it, but the
> raw data has only 3 possible exits (like third party data):
>   a) it’s identifiable, but the user allowed you to collect it
>   b) it’s identifiable, but there is a permitted use you claim and you adhere to the
> restrictions of that permitted use
>   c) it’s not identifiable, it’s been deidentified
> 
> For all these, we need a definition of what data in a deidentified state means.
> To me, it means it’s got detached from any given user (user-agent, or device)
> and can and/or will never be reattached.
> 
> >
> > Mike
> >
> >
> >> -----Original Message-----
> >> From: Rob van Eijk [mailto:rob@blaeu.com]
> >> Sent: 14 August 2014 19:55
> >> To: David Singer
> >> Cc: Justin Brookman; public-tracking@w3.org; Mike O'Neill
> >> Subject: Re: Deidentification (ISSUE-188)
> >>
> >> The core of my issue, which may be a symantic issue, is that the current
> >> text is fixed on the word identification. To me it is not clear enough
> >> from the current definition that anything else than the 'one way street'
> >> is considered re-identification. The definition must be more specific on
> >> this point.
> >>
> >> Does cookie-syncing (which is commonly used in real-time bidding) fall
> >> under the meaning of re-identification?
> >>
> >> Rob
> >>
> >> David Singer schreef op 2014-08-14 18:37:
> >>> Rob, I am sorry, I don’t follow you at all.
> >>>
> >>> We say in a number of places that data passes out of our scope, and
> >>> hence we say nothing at all about it, once it has been deidentified.
> >>> We need to define what we mean by that, and we need to define that
> >>> ‘exit’ from our scope.
> >>>
> >>> On Aug 14, 2014, at 2:08 , Rob van Eijk <rob@blaeu.com> wrote:
> >>>
> >>>>
> >>>> The text you propose connects the state of a permanently de-identified
> >>>> dataset to the possibility of identifying a user/user-agent or device.
> >>>> I think limiting the approach to identification is way too limited.
> >>>> What is not covered is for example:
> >>>> - the sharing (for e.g. data enrichment and data correlation).
> >>>
> >>> if it doesn’t identify anyone, and won’t/can’t, we have nothing to say
> >>> about sharing it
> >>>
> >>>> - the application of de-identified data to the individusl user/user
> >>>> agent/device (for e.g. re-targeting).
> >>>
> >>> That’s re-identification, and my text says (a) it ought not be
> >>> possible and (b) it ought not be permitted
> >>>
> >>>> - the retention of data meaning the duration of time that would be
> >>>> allowed to bring data in de-identified state.
> >>>
> >>> That’s a separate question: the ‘raw data’ question (and one of the
> >>> exits for raw data is that the data is deidentified)
> >>>
> >>>> - any (unintended/unforeseen) data uses that may have an impact on a
> >>>> (the personal space) of a user/user agent/device. For example
> >>>> re-targeting based on de-identified data, or re-targeting based on
> >>>> correlation with de-identified data.
> >>>
> >>> I don’t understand how one can target anyone if the data is
> >>> deidentified, and if it’s reidentified, then it wasn’t deidentified to
> >>> this definition (the definition insists it is a one-way street).
> >>>
> >>>>
> >>>> My proposal is to exclude text for de-identified data in order to aim
> >>>> for a cleaner specification.
> >>>
> >>> Again, I don’t understand.  The point of defining it is to say “how to
> >>> get out of the scope of this spec.”.  For example, the raw data clause
> >>> I proposed says there are only 3 exits:
> >>> * you have permission from the user to retain the data
> >>> * you retain the data under a permitted use, in accordance with the
> >>> terms of that permitted use
> >>> * you deidentify the data so it passes out of our scope
> >>>
> >>>
> >>>>
> >>>> Rob
> >>>>
> >>>> David Singer schreef op 2014-08-14 01:58:
> >>>>> On Aug 8, 2014, at 6:54 , Mike O'Neill <michael.oneill@baycloud.com>
> >>>>> wrote:
> >>>> (...)
> >>>>> Trying another way of phrasing it:
> >>>>> Data is permanently de-identified (and hence out of the scope of this
> >>>>> specification) when a sufficient combination of technical measures
> >>>>> and
> >>>>> restrictions ensures that the data does not, and cannot and will not
> >>>>> be used to, identify a particular user, user-agent, or device.
> >>>>> Note: Usage and/or distribution restrictions are strongly recommended
> >>>>> for any dataset that has records that relate to a single user or a
> >>>>> small number of users; experience has shown that such records can, in
> >>>>> fact, sometimes be used to identify the user(s) despite the technical
> >>>>> measures that were taken to prevent that happening.
> >>>>> David Singer
> >>>>> Manager, Software Standards, Apple Inc.
> >>>
> >>> David Singer
> >>> Manager, Software Standards, Apple Inc.
> >
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v1.4.13 (MingW32)
> > Comment: Using gpg4o v3.3.26.5094 - http://www.gpg4o.com/
> > Charset: utf-8
> >
> >
> iQEcBAEBAgAGBQJT7QwIAAoJEHMxUy4uXm2J7vkIAOUDdIGXlCpvJw9U/KYAbjC
> N
> > I/T2dcIsN3Bd095aNyj+eTiC32sQ96Tc5+q//f9zLx+/CERbIy5/lOhfEQpC6z4z
> > gQuJC/Ol691owAGEQFAQEN7sZ4u5nhFFuJzhPnZILBi9tzBj4wLByxskGgf3yMyT
> > rlYi50rZpTghA4QOKvszDxAgP/hyRnk2cjWcCCjaiMWVKQh3j7aKUtit4JgU/JKb
> >
> ME50WRt43StzEtcaFfsPGHzwVjG/3z5wqEMWSTnwuyq68OfN8U3g0hmaDhJUzw
> oU
> > P5+tPJOImfOSr0H5eCIXQkKLP6sz8HSrt+HPcNrAO/uKCmIGKlD4AAqSe5Ji0gI=
> > =oQfL
> > -----END PGP SIGNATURE-----
> >
> 
> David Singer
> Manager, Software Standards, Apple Inc.
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (MingW32)
Comment: Using gpg4o v3.3.26.5094 - http://www.gpg4o.com/
Charset: utf-8

iQEcBAEBAgAGBQJT7jAqAAoJEHMxUy4uXm2JQ5YIAMdK4v8iwBq0X1eMeWpFumT9
F3Ny8JvTJNke4xGnNGA6nsLjkokbEh+7xJeiqkJtOKwWC1xZcQS1zLw6tb97REx4
L0oUu8hQ5BYH7F3kZVPJ05u7tRhQfhy4k0fe9jG/glK/+ymUb5i1naVkO7NZG30j
PmvO01/u8tQdpUW1q7fkfwLojcka5XdZb/4QFd8Fb5rez22SpqERQiEMv+tw4na/
S2GVdUX9E5ByngiYuMLr7psn2T50FC/QR+KkfKPaKjXz/PQBR+YP3zHk+MhGGi/u
4FViOatRJhXAd5rydtvAbju9+LNad2fJet/iMiV15wLLYvIyyYcIa7rX/XOnUJM=
=vxfq
-----END PGP SIGNATURE-----
Received on Friday, 15 August 2014 16:07:40 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:40:12 UTC