Re: ISSUE-5: definition of tracking from David Singer on 2012-09-05 (public-tracking@w3.org from September 2012)

From: David Singer <singer@apple.com>
Date: Wed, 05 Sep 2012 08:44:50 -0700
To: W3 Tracking <public-tracking@w3.org>
Message-id: <3A5447DF-B810-4580-8269-0E6D54944C98@apple.com>
On Sep 5, 2012, at 1:29 , Roy T. Fielding <fielding@gbiv.com> wrote:

> On Sep 4, 2012, at 4:16 PM, David Singer wrote:
>> On Sep 4, 2012, at 15:20 , "Roy T. Fielding" <fielding@gbiv.com> wrote:
>> 
>>> On Sep 4, 2012, at 10:07 AM, Aleecia M. McDonald wrote:
>>> 
>>>> 	(c) Buried in this discussion (of "absolutely not tracking") was David Singer's attempt to define tracking: "Tracking is the retention or use, after a transaction is complete, of data records that are, or can be, associated with a single user." (I'd append: ", user agent, or device.")   Unlike every other time someone has made the attempt, the one and only reply was in support. Does that mean we can live with this? [Note that issue-5 is currently raised]
>>> 
>>> Probably not.  It does us very little good to define tracking such
>>> that it encompasses all access logs, since they are essential
>>> to any site that isn't deliberately acting as an open gateway.
>>> Are we agreed to that at least?
>> 
>> Actually, I was trying for a definition which clearly *excluded* data that was *out* of scope, and then discussed -- via permissions, and exceptions and so on -- uses that fall into the scope and need discussion.
> 
> Access logs involve the retention of IP addresses, request targets,
> and other request attributes long after a transaction is complete.
> If keeping an access log is considered tracking, then almost all servers
> on the Web track (the exceptions being a few privacy-masking portals).

One of the permissions is precisely the keeping of access logs.

> I don't believe that defining tracking such that almost every server on
> the Web is non-compliant (and will remain non-compliant)

You must be reading a different definition;  nowhere did I write "and those who keep such data are non-compliant". Rather, as I say, the definition is constructed so as to narrow the scope;  if what you are doing falls *outside* this scope (for example, real-time transactional data, or data that is recorded that doesn't associate to a single person, such as cumulative visit counts), then you can stop reading.

> is a viable
> choice if we think deployment of the protocol is desirable, nor do
> I think it matches user expectations about "do not track", so I'd
> like to have a definition that matches whatever it is that the user
> wants us to stop doing when they send DNT:1.

Sure. Agreed. I think it has to include some element of keeping data about people (not just use, it's about collection as well).

>>> A variation on David's definition would be:
>>> 
>>> Tracking is the retention or sharing of data collected from an
>>> interaction to associate that interaction with a specific user
>>> (or their personal user agent or device) and use that association
>>> to obtain, collect, or correlate that user's behavior beyond
>>> the scope of a single session.
>> 
>> That's not the only (or even possibly primary) use that worries people, in my understanding.
> 
> I am trying to define tracking, not their worries.  If folks can
> talk about what the above does not cover, then we can look for some
> wording that plugs the gaps.  Or we can start with any of the four
> other definitions that I have proposed.  Or some new definition,
> if someone gets an inspiration.

OK, maybe I am mis-reading what you wrote, as "obtain, collect…user's behavior" reads at best a little oddly.

Interestingly, you drop the hard line I had between data used in the transaction, and data kept after it is complete.  Or is that covered by "the scope of a single session"?  If so, is 'session' reasonably well-defined, or could a site claim that a session starts when you are born and ends when you die?

I also struggle with the question of log files.  They are, as you say, essentially ubiquitous, but if we mark them as out of scope, then we're basically saying that it's fine to keep the raw ingredients for something problematic, just not OK to make them into that problematic something, and I have a very hard time working out how to define that boundary -- but I think it would be great if we could succeed.  At the moment, we write a 'permission' for raw log files with limits on what you can do with them, which is less elegant but may be easier.



David Singer
Multimedia and Software Standards, Apple Inc.
Received on Wednesday, 5 September 2012 15:45:28 UTC