Re: ISSUE-5: What is the definition of tracking? from David Wainberg on 2011-10-25 (public-tracking@w3.org from October 2011)

From: David Wainberg <dwainberg@appnexus.com>
Date: Tue, 25 Oct 2011 16:55:46 -0400
To: Sean Harvey <sharvey@google.com>
CC: "public-tracking@w3.org Group WG" <public-tracking@w3.org>
Message-ID: <4EA72252.7070109@appnexus.com>
On 10/24/11 2:12 PM, Sean Harvey wrote:
>
> Defining tracking is trickier than one might think, and we should be 
> attuned to the long-term ramifications of whichever approach we take. 
> Currently we're focused on exception use cases and the temptation is 
> to essentially define "tracking" as "everything but x". Should we 
> continue with this approach there are two issues we need to be aware of:
>
>  1. This will sound slightly pedantic, but the danger of forgetting
>     something obvious or basic in the list of exceptions, for example
>     referrer URLs have been mentioned, and there are other obvious
>     examples of data sharing cross-site: HTTP headers, TCP/IP
>     handshakes, etc. These are examples of cross-site data sharing
>     with your browser that do not uniquely identify you to the server
>     you're interacting with (though the issue of uncommon http headers
>     was briefly raised by EFF), but are data sharing nonetheless.
>
>     I think it's therefore important to add definitionally that we are
>     talking about pseudonymous (or personal) identification of an
>     individual, an individual browser instance, or an individual
>     device for some business or other purpose.
>
I agree about the problems of the "everything but x" approach, and would 
much prefer a clear and narrow definition of the data collection/use 
that is covered by the standard.

I also agree that the association of the data with a particular 
user/browser/device is probably an element of the definition. Other 
elements may be: length of time the data is stored, the nature of the 
data, the location the data is stored, how the data is to be used or 
transferred.

The current strawman proposes this definition for Behavioral Data: " 
Data that associates one or more pages viewed by a given browser 
instance with that user or browser instance, via a cookie or analogous 
technology."

Here's a version that incorporates the elements I listed above:

  *      Data that associates one or more URLs viewed by a given user,
    browser instance, or device with that user, browser, or device,
  *      and is stored outside of the browser or device, for a period of
    time longer than XX,
  *      or is transferred to another party to be stored for longer than
    XX."

This definition gets at what I think is a key concern, the accumulation 
over time of a user's web browsing history.

> 1.
>
>
>  2. The other danger of an exceptions-based definition of "tracking"
>     is that it is highly restrictive of future business models in
>     potentially unpredictable ways. Two years ago we would not be
>     considering definitions of "first party" that may or may not
>     include embedded video content from YouTube or like buttons from
>     Facebook; and it is possible that we would have collectively
>     written an exceptions-based standard that didn't work very well in
>     this new landscape. It's therefore worth at least discussing if we
>     want the definition to identify what we are trying to address
>     outside the context of the exceptions -- NOT that we make the same
>     mistake on the other end by creating a harms-based definition, but
>     that we quantify the harms we are trying to address and tailoring
>     our definition of tracking to them to a degree.
>
Yes, please! Can we start with a list of potential harms and see what it 
looks like? Here's a couple to start:

  * privacy harm stemming from government access to web browsing history
  * public exposure of web browsing history

Also, I have doubts that 1st vs 3rd party is the correct distinction; 
your example demonstrates one key reason why that is. Focusing on the 
1st vs 3rd party distinction will shape the market and inhibit future 
innovation in unexpected ways, without a direct benefit to consumers. We 
should instead focus on particular data collection and uses and the 
risks associated with them, regardless of who the party is.
>
>  1. The dialogue on the Issue 5 email chain has only sometimes
>     reflected one of the important conversations we had in Cambridge,
>     and that was that cross-entity data sharing is a more foundational
>     concern than the first/third party distinction, which is really
>     just an imperfect short cut to the former. My opinion at this
>     stage (though I'm certainly open to persuasion) would be that we
>     need to note the following issues here:
>
> ·Should first parties be exempted only to the extent that they do not 
> combine their data with individual-level data from third parties? If 
> I'm a first party and I see a DNT header, should I still be restricted 
> from adding data collected from a DNT-passing customer to 
> individual-level data from an offline third party company's database? 
> Should I be allowed to append it or combine it with data collected 
> with individual-level offsite data I have purchased?
>
Two questions. First, does it depend on the nature of the data that is 
to be appended? Second, would it be helpful to back up for a minute and 
identify the rationale for the 1st vs 3rd party distinction? If it goes 
to user expectations, then we need to draw a line based on what is 
reasonably expected and what's not. There is represented in this group a 
wide variety of assumptions of what those expectations are. I have no 
idea how to reconcile these assumptions. All I can think to do is give 
the user choices based on truly meaningful distinctions, and let them 
make up their own minds.
>
> ·By the same token there are instances where a "third party domain" is 
> used by a publisher as a software tool for analytics or ad serving. If 
> that third-party tool is combining data from multiple sites, then 
> obviously that falls under the definition of "tracking". But what if 
> it is merely a software tool for the first party's use only & the 
> first party is the sole data owner? In this instance it is probable 
> that the third party software tool is "first party" under the current 
> DNT definition options, which again emphasizes we are focused on 
> cross-site data sharing rather than the first/third party distinction.
>
It does not "obviously fall[] under the definition of 'tracking'" We 
have no such definition. But, otherwise, this is a good question. To put 
it differently, does an agent of any party essentially stand in the 
shoes of that party with regard to DNT compliance? But this raises 
additional questions. Is data ownership the key distinction? Is the type 
and nature of mixing with other data relevant? Is the domain relevant? 
Is there a difference between an agent and a software tool? Does it have 
something to do with control or possession of the data?

I think to a large extent, yes, the agent or software tool stands in the 
shoes of the party. Data ownership or licensing can be nebulous, though, 
so I would propose something along the lines of "used and stored at the 
direction of" as the hook, and that it should be applied to data defined 
by its identification with a particular user or device (per the 
discussion above).
Received on Tuesday, 25 October 2011 20:56:23 UTC