W3C home > Mailing lists > Public > public-tracking@w3.org > October 2013

Re: ISSUE-5: Consensus definition of "tracking" for the intro?

From: Roy T. Fielding <fielding@gbiv.com>
Date: Tue, 15 Oct 2013 15:55:16 -0700
Cc: John Simpson <john@consumerwatchdog.org>, "public-tracking@w3.org (public-tracking@w3.org)" <public-tracking@w3.org>
Message-Id: <0F48CC39-79CE-4FB0-8A32-B44AF637939A@gbiv.com>
To: David Singer <singer@apple.com>
We still aren't on the same page with regard to the point of this exercise.

We need a definition of tracking because the protocol is supposed to
be expressing a user preference, and the only thing the user is
being informed about is:

Firefox 24.0:

  "Tell sites that I do not want to be tracked"
  + plus a link to a web page that says
     "Mozilla Firefox offers a Do Not Track feature that lets you
     express a preference not to be tracked by websites. When the
     feature is enabled, Firefox will tell advertising networks and
     other websites and applications that you want to opt-out of
     tracking for purposes like behavioral advertising."

Safari iOS:

  "Do not track" (with a link to a Safari and Privacy document that says:
     "Some websites keep track of your browser activities when they serve
      you content, which enables them to tailor what they present to you.")

Safari OS X 6.0.5

  "Website tracking: [ ] Ask websites not to track me"

Chrome 30.0.1599.69:

  "Send a 'Do Not Track' request with your browsing traffic"
     [with a pop up that basically says it isn't defined]

Internet Explorer (reported):

  "Always send Do Not Track header"

Hence, the user needs a definition of tracking (or "to track") in
order to have an informed preference, and sites need to know what
that definition is in order to understand the meaning being expressed
by the user and ensure that their own behavior is consistent with
how they have informed users in the exception dialogs and privacy
policies.

The essence of standards is to ensure that all parties share the
same vocabulary when communicating.

On Oct 11, 2013, at 5:21 PM, David Singer wrote:
> On Oct 10, 2013, at 17:39 , Roy T. Fielding <fielding@gbiv.com> wrote:
> 
>> Ah, now I can critique two definitions in one response ...
>> 
>> On Oct 10, 2013, at 1:15 PM, John Simpson wrote:
>> 
>>> I don't want to rain on your march toward consensus parade, but I have trouble with the " across multiple parties' domains or services" language.
>> 
>> Why?
>> 
>>> It seems to me Rob's language -- proposal 4 -- has it exactly right, particularly when you include his suggested non-normative text:
>>> 
>>> 
>>>> "Tracking is any form of collection, retention, use and/or application of data that are, or can be, associated with a specific user, user agent, or device.
>>>> 
>> 
>> Allow me to illustrate why this is false.
>> 
>> When you login to your online bank account (certainly an application
>> of data that is associated with you), is the bank tracking you?
>> Is DNT:1 going to turn that off?
> 
> Yes, under some definitions they are:  if they (as likely) keep records, they are remembering data about you.  But they are a first party, so they get a big carve-out.

There are no carve outs in definitions.  The fact that a first party
has a big carve out in compliance is evidence that the definition is
wrong: users do not consider it to be tracking when a website they
intentionally use has retained data about their past use.

>> Tracking, as defined above, includes everything on the Internet.
> 
> No, there are plenty of services that don't keep personal info.  DuckDuckGo is the most famous example.

I do not believe that DuckDuckGo meets the above definition, and the
only reason it is famous is because it is the only example.

I know for a fact that most hardware routers do not meet these other
definitions, and certainly none of the L7 load balancers will manage it.
Apache's default status monitor won't meet either definition.
The earlier use of "transient storage" was better in that regard.

>>> I can live with what's in the the current editors draft:
>>> 
>>> Tracking is the retention or use, after a network interaction is complete, of data that are, or can be, associated with a specific user, user agent, or device.
>> 
>> Likewise, that says all data use on the Internet is tracking.
> 
> No, it omits (a) data used to service transactions (within the interaction) and (b) data not associated with a specific user etc.  That's a lot of data.

No, what it omits isn't relevant because the data will be retained.
IP addresses get stuck at all layers.  Any application that involves
security has an audit trail.  Every first party website has an access log.

>> I claim that the above definition has no relation to our work.
>> 
>> There is nothing in the original DNT proposal that would suggest
>> a user's expectations when setting DNT:1 would be that they could
>> only perform anonymous activity on the Internet.  
> 
> No, they are anonymous *to the organizations that they didn't choose to interact with, and for the most part are unaware of*.  We *have* a first-party carve-out, long-since agreed.  But even the first party has some restrictions on what it does with 'tracking data' (like, not sharing it around).

Right, so we need to define tracking in a way that corresponds to
what DNT intends to turn off. Other definitions would intentionally
mislead users.

> As I said in the 'tunnel-vision' approach, if it had been adopted, it would have meant that the distinction between first and third parties might not have been needed at all, as indeed most first parties only want to remember your interaction with them, and if that's not 'tracking', they don't need a special carve-out.  But this approach did not get support;  I am not sure why.  I suspect it was too rigorous for the industry, and too permissive for the privacy people.

Yes, so stop thinking about that.  It is not relevant to this discussion.
Even if the proposals were the same (they aren't because that proposal
was tied to the specific domain names), the group never made a decision
one way or the other.

> Having said that, I just posted something which is, I hope a synthesis of what's on the CP list, and is closer to what you wrote than this suggestion.  It's much closer to tunnel-vision than "don't remember stuff about me" (unless you are a first party). 
> 
> Let's see if we can find a reasonable middle ground here.

You have not indicated that there is anything wrong with my proposal.

  Tracking is the act of following a particular user's browsing activity
  across multiple distinct contexts, via the collection or retention of
  data that can associate a given request to a particular user, user
  agent, or device, and the retention, use, or sharing of data derived
  from that activity outside the context in which it occurred.
  For the purposes of this definition, a context is a set of resources
  that share the same data controller and a common branding, such that
  a user would expect that data supplied to one of the resources is
  available to all of the others within the same context.

I believe that is short enough.  If a shorter definition is needed,
then the explanatory bits can be removed, a la

  Tracking is the act of following a particular user's browsing activity
  across multiple distinct contexts and the retention, use, or sharing
  of data derived from that activity outside the context in which it
  occurred.

Or if folks are still concerned about the word "following"
(it came from a dictionary definition of the verb "track" at
https://www.google.com/search?q=define%3A+track
), then we can replace it with

  Tracking is the observation of a particular user's browsing activity
  across multiple distinct contexts and the retention, use, or sharing
  of data derived from that activity outside the context in which it
  occurred.

....Roy
Received on Tuesday, 15 October 2013 22:55:41 UTC

This archive was generated by hypermail 2.3.1 : Friday, 3 November 2017 21:45:19 UTC