- From: Roy T. Fielding <fielding@gbiv.com>
- Date: Wed, 16 Oct 2013 01:52:46 -0700
- To: David Singer <singer@apple.com>
- Cc: "public-tracking@w3.org (public-tracking@w3.org)" <public-tracking@w3.org>
On Oct 15, 2013, at 2:30 PM, David Singer wrote: > "Tracking is the retention or use by a site outside the first party, after a network transaction is complete, of data that is, or can be, associated with a specific user, user agent, or device." Well, that is better, but again runs into the problems that I described before: 1) "first party" is itself a defined term, depends on the context of a user's action prior to the current request, and would have to include service providers. 2) data is often retained "after a network transaction is complete" just for the sake of caching 3) we need to include sharing, as suggested by the FTC commissioner in response to my question at the DC F2F. and "by a site outside" seems to assume that tracking is done by a site (instead of some party with access to data collected in a given interaction). BTW, your suggestion would be more readable as Tracking is the retention by a third party of data that can be associated with a particular user. (i.e., use doesn't matter if we aren't constraining it for the current interaction, and use isn't possible for prior interactions if that data cannot be retained, and the only reason user agent and device matter is because they can be associated with a user.) I can see why you are suggesting this as a summary of what the compliance spec is all about. However, it does a poor job of corresponding to what a user would think of as tracking. In comparison, the short definition that I posted Tracking is the observation of a particular user's browsing activity across multiple distinct contexts and the retention, use, or sharing of data derived from that activity outside the context in which it occurred. does not limit the definition to a particular software or role, a specific number of protocol interactions or requests, or a form of data that remains associated with a particular user. ... On Oct 15, 2013, at 4:22 PM, David Singer wrote: > On Oct 15, 2013, at 15:55 , "Roy T. Fielding" <fielding@gbiv.com> wrote: >>> Yes, under some definitions they are: if they (as likely) keep records, they are remembering data about you. But they are a first party, so they get a big carve-out. >> >> There are no carve outs in definitions. The fact that a first party >> has a big carve out in compliance is evidence that the definition is >> wrong: users do not consider it to be tracking when a website they >> intentionally use has retained data about their past use. > > No, it *is* tracking data, the first party *does* have some restrictions on what they can do with it. A scope that establishes "this spec concerns broadly data in category X" and then says what restrictions there are for various people on that data is perfectly normal. But I can maybe live with a definition that excludes it, if it makes it easier for you (as earlier offered). That doesn't make it tracking data. Yes, the compliance spec has those restrictions, but they exist to prevent a third party from receiving data that it might use for tracking the user if it receives the same kind of data from multiple sites. My proposal accounts for that in a more straightforward way. >>> No, it omits (a) data used to service transactions (within the interaction) and (b) data not associated with a specific user etc. That's a lot of data. >> >> No, what it omits isn't relevant because the data will be retained. >> IP addresses get stuck at all layers. Any application that involves >> security has an audit trail. Every first party website has an access log. > > We have a security permission, and a first-party permission, and a raw data permission. It is tracking data, but you can keep it under some restrictions. We have those permissions for third parties. Here we are talking about data collected by the first party. >>> No, they are anonymous *to the organizations that they didn't choose to interact with, and for the most part are unaware of*. We *have* a first-party carve-out, long-since agreed. But even the first party has some restrictions on what it does with 'tracking data' (like, not sharing it around). >> >> Right, so we need to define tracking in a way that corresponds to >> what DNT intends to turn off. Other definitions would intentionally >> mislead users. > > Well, we also need to be careful not to mislead ourselves or site implementers. They read specifications; users typically do not. My intention is that the definition, once defined, be provided consistently by all implementations. I don't expect a user to read the specification. >> You have not indicated that there is anything wrong with my proposal. > > 1. Who is doing what 'across multiple distinct contexts'? This is an undefined part of your definition. Yes, that may be the aggregate effect, but we need to know (the users, and the sites, need to know) is 'this single possible action by a site' within the definition of tracking or not? The user's browsing activity is observed across multiple distinct contexts. It means that observing the user's activity only within a single context is not tracking. The reason it is there is because the verb tracking and the privacy concern we are trying to address are both about identifying the trail of an individual as they proceed from place to place. Specifically, remembering that a person was at a single place is not tracking unless that memory is shared with someone else or combined with memories of other places. "who" is doing the tracking is not important with this definition, as one would expect if they were looking up the term in a dictionary. > If I leave it out, is the definition worse? If so, how? It changes the scope of data that the reader will expect to be subject to the constraint. > "Tracking is the act of following a particular user's browsing activity, via the collection or retention of data that can associate a given request to a particular user, user > agent, or device, and the retention, use, or sharing of data derived from that activity outside the context in which it occurred." It means that the first party examples I gave earlier are considered tracking, whereas my definition excludes them because the observations are limited to a single party's context. > 2. This definition does not exclude, as mine does, the use of data to answer the request, i.e. it doesn't have a clear idea of when "tracking" starts. I think it starts after you've satisfied the request (HTTP request, and its response). I would put back in "after a network transaction is complete". Even if 'retain' has that meaning, it leaves it ambiguous whether I can use the data to respond to you (and we may as well be clear that you may). Responses are always in the same context, so that isn't a concern under my definition. > 3. This leaves off the table conclusions that the site can draw about the user. So, imagine I detach the actual request log from the user, somehow, so they are no longer connected, but I remember > * Roy was in California, online, and visiting the web at 3pm pacific on Sept 25th > * Roy is interested in recipes that use brown lentils > * Roy is able to visit sites that offer alcohol for sale, and buy at them; he's probably an adult That is all data derived from the observed activity. If it is used only within the context in which it occurred, then no problem; otherwise, it becomes tracking when used, retained, or shared outside the context in which that activity occurred (because that makes a track). I've gone back and forth regarding whether the definition of tracking ought to exclude data derived from the observed activity after it has been de-identified (i.e., no longer associated with a particular user). I'd like to say "retention, use, or sharing of XXX data derived from that activity", but there is no good adjective for XXX given historical concerns over the terms "personal data" and PII, conflicting common meanings for the terms "identified" and "linkable", and a bad taste in my mouth for something like "non-de-identified". > There is a whole host of data you can remember about me that is not specifically tying me to a given request, under this definition. I don't think that is acceptable. I don't see anything in my definition that is restricted to tying you to a given request. There is a whole host of data that I can remember about you that has nothing to do with tracking. It isn't our job to prevent Web sites from knowing their own customers, for example, since we are not working on a protocol for anonymous browsing. ....Roy
Received on Wednesday, 16 October 2013 08:53:10 UTC