W3C home > Mailing lists > Public > public-tracking@w3.org > November 2013

RE: ISSUE-5: Consensus definition of "tracking" for the intro?

From: Mike O'Neill <michael.oneill@baycloud.com>
Date: Wed, 6 Nov 2013 00:18:36 -0000
To: "'Roy T. Fielding'" <fielding@gbiv.com>, "'David Singer'" <singer@apple.com>
Cc: <public-tracking@w3.org>
Message-ID: <03f201ceda85$bd44a0a0$37cde1e0$@baycloud.com>
Hash: SHA1

But a combination of the data collected in one context can be used to track someone across the web. Considering only your contracted analytics use case, if the third-party collects a unique id scoped solely to the first-party domain, combining that with the contents of the Referer header will give you a universal unique identifier. A URL query parameter would also suffice. A set of these identifiers will be associated with a single user/device and the third-party can collect all of them. There is a small problem on how to collapse them all to a single manageable (reasonable length) key but that can be done using the device's IP address to thread them together (over a short period it will, even with IPv4 NAT or IPv6 anonymous auto configuration), or by ensuring the unique ids were already universally unique (across all domains).

You might say the act of combining data in this way, perhaps secretly, constitutes tracking but does your definition cover it? 

- -----Original Message-----
From: Roy T. Fielding [mailto:fielding@gbiv.com] 
Sent: 05 November 2013 22:40
To: David Singer
Cc: public-tracking@w3.org (public-tracking@w3.org)
Subject: Re: ISSUE-5: Consensus definition of "tracking" for the intro?

On Nov 5, 2013, at 12:57 AM, David Singer wrote:
> On Oct 18, 2013, at 9:33 , Roy T. Fielding <fielding@gbiv.com> wrote:
>>> So, concretely, a hidden third-party tracker on a page can remember that you visited that page, or not?  If not, can it remember the nature of the site you visited (it was a guns and ammo kind of site)?  When you made the transaction?  Your IP address, geolocation, local time of day, user-agent, …?
>> All of that data is user activity in the first party context.
> I was asking specifically about the 3rd party, recording what it gets in, and derived from, the HTTP requests.

The first two questions were about the context in which the user's
activity took place.  If the third party only collects such data at
one location (e.g., contracted single-site web analytics), then
it is not tracking because it can't observe the user in any other
context.  If the third party does observe a particular user's activity
in any other context, or if it's definition of "you" comes from some
other context, then it is tracking.

The time an embedded request is made is not itself tracking.
The IP address received in a request from a single context is not
tracking because it does not (by itself) cause the user to be observed
at multiple contexts. Geolocation would naturally depend on its
granularity, but it isn't really an issue for third party requests.
Recording a user agent string is not (by itself) tracking.  Using
any of the above to construct a tracking algorithm for the sake of
following a user across multiple contexts via data correlation
is tracking.

Again, let me reiterate: my proposed definition is not limited to
a single network interaction, nor does it depend on DNT.  It merely
states conditions that a USER would consider tracking, and would
expect not to be applied for any data collected with DNT:1 set.
At the same time, I am not including the collection of personal
data for the sake of a single context, even if that personal data
is being collected by a third party, because it is only for the sake
of that single context (and not accessible outside of that context).
The fact that such data could, in fact, be used by someone else
to track the user simply doesn't matter -- as soon as they do so,
they are tracking, as defined.

DNT is not a security protocol.  It does not prevent tracking.
It only defines and expresses a desire, and that desire will either
be obeyed or not.  It is therefore impossible to construct a screw
case wherein a user is actually being tracked across multiple
contexts that would somehow escape the definition as proposed.

>> If the
>> third-party tracker observes it, then any of the following will cause
>> it to be tracking under this definition:
>> 1) the third party observes the user's browsing activity in any
>>    other context, including one where it is the first party;
>> 2) the data is provided to anyone other than the first party and
>>    they combine it with observations obtained from any other context.
>> This is analogous to walking down the street, seeing a person with
>> an unusual t-shirt, saying Hi, and continuing on with your walk.
>> If you don't see that person again (or at least don't recognize
>> them in a different shirt), then it cannot be tracking.  If you
>> do see them again, at the same location, then it still isn't tracking.
>> If, however, you see and recognize them again in a different location
>> and choose to remember that fact, then you have tracked them.
> Tracking only happens the 2nd and subsequent times??

At least two locations (contexts) is necessary to be considered
tracking.  Otherwise, we aren't talking about a track.  A single
point is not a track.

>>> This seems to permit the accumulation, by third parties, of a lot of data about the user, and I am unsure if that's your intent, or it's accidental, or a misread on my part.
>> Yes, a third party can learn the data provided by the user agent in
>> a specific context.  The immediate example of that is contextual
>> advertising, which we already agreed is not tracking.
> That's not learning, that's using the data in the transaction itself.

There is no relevant distinction between those terms.

>> Note, however, that all of your examples assume that they also know
>> who "you" is.  Why do you think the third party would know that
>> information?
> I tend to provide an IP address and other distinguishing data, such as a fingerprint, in transactions.  The third party could have cookied me the first time, too.

At a different context, right?

>> If they are relying on any other information, from any
>> other source, that has the effect of identifying you, then they are
>> already tracking according to that definition.
>>>> The reason it is there is because
>>>> the verb tracking and the privacy concern we are trying to address
>>>> are both about identifying the trail of an individual as they
>>>> proceed from place to place.  Specifically, remembering that a
>>>> person was at a single place is not tracking unless that memory
>>>> is shared with someone else or combined with memories of other
>>>> places.
>>> But the next and subsequent times I visit a site that has the same third-party tracker on it, and they are allowed to remember some data that's associated with me, how is it NOT forming a trail?
>> Because it is the same context.  The fact that a given user agent
>> visited the same site more than once is not a privacy concern if
>> the third party doesn't know anything else about the user.
> No, I visit two DIFFERENT sites with the SAME 3rd party tracker.  What is that 3rd party allowed to remember under your definition?  What is NOT tracking data?

Retention of anything about the context in which those embedded requests
were made that remains tied to a particular user would amount to tracking,
regardless of how that data is obtained.

> Saying the 2nd transaction is tracking is double-speak; I don't know it's the 2nd unless I can correlate it with the first.

Which is why your questions are not relevant to tracking.  If you want
to define a protocol for anonymous browsing, don't call it "Do Not Track".
The privacy concern we are addressing in *this* protocol is the personal
knowledge gained by observing a user across multiple contexts
(i.e., the ability to learn something new about, or build a profile on,
the user by observing their activity at multiple contexts).


Version: GnuPG v1.4.13 (MingW32)
Comment: Using gpg4o v3.1.107.3564 - http://www.gpg4o.de/
Charset: utf-8

Received on Wednesday, 6 November 2013 00:19:13 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:40:02 UTC