Re: ISSUE-5: Consensus definition of "tracking" for the intro?

On Oct 15, 2013, at 15:55 , "Roy T. Fielding" <fielding@gbiv.com> wrote:

> We still aren't on the same page with regard to the point of this exercise.
> 
> We need a definition of tracking because the protocol is supposed to
> be expressing a user preference, and the only thing the user is
> being informed about is:
> 
> Firefox 24.0:
> 
>  "Tell sites that I do not want to be tracked"
>  + plus a link to a web page that says
>     "Mozilla Firefox offers a Do Not Track feature that lets you
>     express a preference not to be tracked by websites. When the
>     feature is enabled, Firefox will tell advertising networks and
>     other websites and applications that you want to opt-out of
>     tracking for purposes like behavioral advertising."
> 
> Safari iOS:
> 
>  "Do not track" (with a link to a Safari and Privacy document that says:
>     "Some websites keep track of your browser activities when they serve
>      you content, which enables them to tailor what they present to you.")
> 
> Safari OS X 6.0.5
> 
>  "Website tracking: [ ] Ask websites not to track me"
> 
> Chrome 30.0.1599.69:
> 
>  "Send a 'Do Not Track' request with your browsing traffic"
>     [with a pop up that basically says it isn't defined]
> 
> Internet Explorer (reported):
> 
>  "Always send Do Not Track header"
> 
> Hence, the user needs a definition of tracking (or "to track") in
> order to have an informed preference, and sites need to know what
> that definition is in order to understand the meaning being expressed
> by the user and ensure that their own behavior is consistent with
> how they have informed users in the exception dialogs and privacy
> policies.
> 
> The essence of standards is to ensure that all parties share the
> same vocabulary when communicating.

Yes, I agree.

If you look at the page the browser companies put together, as an attempt to answer the request that this preference be backed by more explanation, it says basically what I am trying to say in my attempts at a compromise definition.

<http://www.w3.org/2011/tracking-protection/drafts/dnt-for-users.html>

The third paragraph lays out pretty clearly that first parties get pretty broad ability to track:

"Do Not Track is designed not to interfere with your online experience. There are few tracking restrictions for first party sites. They can remember things like who you are, that you visited their site and browsed around, that you interacted with them by filling in forms or that you bought something. First parties are not allowed, however, to pass data to the third parties that their pages pull in, unless a particular third party is allowed to have collected the data independently."

(We could do with feedback on this page; it's got submerged in this group's preference to talk about process, alas.)



I think it would also help to inform site developers the rough scope of the data that concerns us, notably:
* after the transaction (in-transaction retention and use is out of scope)
* of data that actually links to a person, their user-agent, or device (anonymous, aggregate, truly de-identified, statistical, and other such data are also off the table).

>> Yes, under some definitions they are:  if they (as likely) keep records, they are remembering data about you.  But they are a first party, so they get a big carve-out.
> 
> There are no carve outs in definitions.  The fact that a first party
> has a big carve out in compliance is evidence that the definition is
> wrong: users do not consider it to be tracking when a website they
> intentionally use has retained data about their past use.

No, it *is* tracking data, the first party *does* have some restrictions on what they can do with it. A scope that establishes "this spec concerns broadly data in category X" and then says what restrictions there are for various people on that data is perfectly normal.  But I can maybe live with a definition that excludes it, if it makes it easier for you (as earlier offered).

>> No, it omits (a) data used to service transactions (within the interaction) and (b) data not associated with a specific user etc.  That's a lot of data.
> 
> No, what it omits isn't relevant because the data will be retained.
> IP addresses get stuck at all layers.  Any application that involves
> security has an audit trail.  Every first party website has an access log.

We have a security permission, and a first-party permission, and a raw data permission. It is tracking data, but you can keep it under some restrictions.

>> No, they are anonymous *to the organizations that they didn't choose to interact with, and for the most part are unaware of*.  We *have* a first-party carve-out, long-since agreed.  But even the first party has some restrictions on what it does with 'tracking data' (like, not sharing it around).
> 
> Right, so we need to define tracking in a way that corresponds to
> what DNT intends to turn off. Other definitions would intentionally
> mislead users.

Well, we also need to be careful not to mislead ourselves or site implementers.  They read specifications; users typically do not.

>> As I said in the 'tunnel-vision' approach, if it had been adopted, it would have meant that the distinction between first and third parties might not have been needed at all, as indeed most first parties only want to remember your interaction with them, and if that's not 'tracking', they don't need a special carve-out.  But this approach did not get support;  I am not sure why.  I suspect it was too rigorous for the industry, and too permissive for the privacy people.
> 
> Yes, so stop thinking about that.  It is not relevant to this discussion.

Well, it was my attempt to provide a definition of something you used ('cross-site tracking') but didn't define.  If it's still relevant, I would welcome your attempt to define it.

> You have not indicated that there is anything wrong with my proposal.

1. Who is doing what 'across multiple distinct contexts'?  This is an undefined part of your definition.  Yes, that may be the aggregate effect, but we need to know (the users, and the sites, need to know) is 'this single possible action by a site' within the definition of tracking or not?

If I leave it out, is the definition worse? If so, how?

 "Tracking is the act of following a particular user's browsing activity, via the collection or retention of data that can associate a given request to a particular user, user
 agent, or device, and the retention, use, or sharing of data derived from that activity outside the context in which it occurred."

2. This definition does not exclude, as mine does, the use of data to answer the request, i.e. it doesn't have a clear idea of when "tracking" starts.  I think it starts after you've satisfied the request (HTTP request, and its response).  I would put back in "after a network transaction is complete".  Even if 'retain' has that meaning, it leaves it ambiguous whether I can use the data to respond to you (and we may as well be clear that you may).

3. This leaves off the table conclusions that the site can draw about the user.  So, imagine I detach the actual request log from the user, somehow, so they are no longer connected, but I remember
* Roy was in California, online, and visiting the web at 3pm pacific on Sept 25th
* Roy is interested in recipes that use brown lentils
* Roy is able to visit sites that offer alcohol for sale, and buy at them; he's probably an adult
*…


There is a whole host of data you can remember about me that is not specifically tying me to a given request, under this definition. I don't think that is acceptable.

> 
>  Tracking is the act of following a particular user's browsing activity
>  across multiple distinct contexts, via the collection or retention of
>  data that can associate a given request to a particular user, user
>  agent, or device, and the retention, use, or sharing of data derived
>  from that activity outside the context in which it occurred.
>  For the purposes of this definition, a context is a set of resources
>  that share the same data controller and a common branding, such that
>  a user would expect that data supplied to one of the resources is
>  available to all of the others within the same context.
> 
> I believe that is short enough.  If a shorter definition is needed,
> then the explanatory bits can be removed, a la
> 
>  Tracking is the act of following a particular user's browsing activity
>  across multiple distinct contexts and the retention, use, or sharing
>  of data derived from that activity outside the context in which it
>  occurred.
> 
> Or if folks are still concerned about the word "following"
> (it came from a dictionary definition of the verb "track" at
> https://www.google.com/search?q=define%3A+track
> ), then we can replace it with
> 
>  Tracking is the observation of a particular user's browsing activity
>  across multiple distinct contexts and the retention, use, or sharing
>  of data derived from that activity outside the context in which it
>  occurred.



David Singer
Multimedia and Software Standards, Apple Inc.

Received on Tuesday, 15 October 2013 23:23:23 UTC