RE: ISSUE-5: What is the definition of tracking? from Amy Colando (LCA) on 2011-10-28 (public-tracking@w3.org from October 2011)

From: Amy Colando (LCA) <acolando@microsoft.com>
Date: Fri, 28 Oct 2011 14:06:51 +0000
To: Jonathan Mayer <jmayer@stanford.edu>, David Wainberg <dwainberg@appnexus.com>
CC: Sean Harvey <sharvey@google.com>, "public-tracking@w3.org Group WG" <public-tracking@w3.org>
Message-ID: <58271C264AD16547AC61CAFA53FBEAF934ED3302@TK5EX14MBXC140.redmond.corp.microsoft.>

Thanks Jonathan.  Isn't it the case that these browser features would still have to be related to some sort of identifier - whether on client side or server side - in order for the information to be identifiable?  And therefore we can stick with pseudonymous, passively collected data?

IOW, if all I have is an aggregated laundry list of browser features that are used by multiple users, where there is no way to say that a set of browser features belongs to a particular record because of the way the identifiers have been removed or the log files scrubbed, then is there a way to relate to a specific identifier?  I think your example below requires the log files to associate the browser features to a particular record or browser, but wanted to make sure I am thinking about this correctly.

From: Jonathan Mayer [mailto:jmayer@stanford.edu]
Sent: Friday, October 28, 2011 2:11 AM
To: David Wainberg
Cc: Sean Harvey; public-tracking@w3.org Group WG
Subject: Re: ISSUE-5: What is the definition of tracking?

Here's an illustrative hypothetical.  Suppose, for each page it's embedded on, a third party logs a bunch of browser features (e.g. user agent, plugins, screen dimensions, etc.) plus the page URL.  And suppose the third party makes no attempt to pseudonymously identify users.  The third party suffers a data breach, and malcontents apply trivial fingerprinting algorithms to the data to reconstruct pseudonymous user browsing histories.

Note that the third party did not hold pseudonymously identified browsing histories - it held pseudonymously identifiable browsing histories.  But that still gives rise to real privacy risks.

On Oct 27, 2011, at 12:30 PM, David Wainberg wrote:

I don't find it excessively nitpicky. It's relevant. Please elaborate. It seems that somewhere the data has to be associated in some way with a distinct user.

On 10/27/11 1:14 PM, Jonathan Mayer wrote:
Fragmented or probabilistic tracking data might not be stored with a hash or other single identifier.  The privacy risk would, of course, be the same.  (I don't mean to be excessively nitpicky - a few months ago my team looked at a third party doing fingerprinting of just this sort.)

On Oct 27, 2011, at 9:02 AM, David Wainberg wrote:

On Oct 25, 2011, at 2:13 PM, David Wainberg wrote:

On 10/24/11 8:18 PM, Jonathan Mayer wrote:

I would strongly oppose limiting our definition of tracking to only cover pseudonymously identified or personally identified data.  There are a number of ways to track a user across websites without a single pseudonymous or personal identifier.
I'm not sure what you mean here. Can you provide examples?
Any means of tracking that relies on fragmented or probabilistic information.  For example, browser fingerprinting.  (See Peter Eckersley's paper "How Unique Is Your Web Browser.")
Ah. I would have included that in pseudonymously identified, because if data is stored against it by the server, it will be stored against a hash or something based on the fingerprint.

Received on Friday, 28 October 2011 14:07:22 UTC