Re: ISSUE-5: Consensus definition of "tracking" for the intro? from David Singer on 2013-10-16 (public-tracking@w3.org from October 2013)

From: David Singer <singer@apple.com>
Date: Wed, 16 Oct 2013 15:30:41 -0700
To: "public-tracking@w3.org (public-tracking@w3.org)" <public-tracking@w3.org>
Message-id: <3D176301-6CD3-4312-8BB3-444FE86261C9@apple.com>
Roy, thanks, this is helpful.

I think there are two problems, both addressable:

a) if something is left out of 'tracking data' then we implicitly lose our ability to say something about it.  So I guess we disagree here, but it's probably something we can manage.  Since both first parties and service providers have restrictions on what they can do with retained data, I think that that retained data is tracking data.  You don't.  I also don't think that a change of party changes the *nature* of the data, though it does change the requirements on how it's handled.  I think a user 'implicitly' gives a site they *choose* to visit, permission to track them (hence the first party carve-out), and that extends, of course to their service providers.

b) I fear that the definition will get unmanageable, unreadable, if we include in it all the permissions and carve-outs.  I think it's cleaner to do something similar to what we did in the draft 'user' page -- set a rough stage, and then refine it.  That also matches what Matthias suggests.

I hear what you say about sharing; indeed, if I don't retain I can't share, but if retain starts after the transaction end, this leaves a loop-hole during the transaction.  This is addressable.

Responses below, but trying again.  (And thanks for the dialog).

Iterated draft:

* * * *

In general terms, Tracking is the retention or use after a network transaction is complete, or sharing, of data that is, or can be, associated with a specific user, user agent, or device. 

However, this recommendation recognizes that by choosing to visit a site, users allow First Parties to retain and use tracking data they collect directly, or indirectly via Service Providers (though there are restrictions on sharing); and it allows Third Parties to claim permission to retain tracking data under some specific conditions (e.g. for security, auditing, or for deferred processing of raw data).

* * * *


On Oct 16, 2013, at 1:52 , Roy T. Fielding <fielding@gbiv.com> wrote:

>  2) data is often retained "after a network transaction is complete"
>     just for the sake of caching

do you mean a 'raw data permission' here, or something else?

>  Tracking is the observation of a particular user's browsing activity
>  across multiple distinct contexts and the retention, use, or sharing
>  of data derived from that activity outside the context in which it
>  occurred.
> 
> does not limit the definition to a particular software or role,
> a specific number of protocol interactions or requests, or a form
> of data that remains associated with a particular user.

OK, but it allows each context to remember a lot (notably, not just the first parties), which is problematic, I think.  See below for questions.

>>> You have not indicated that there is anything wrong with my proposal.
>> 
>> 1. Who is doing what 'across multiple distinct contexts'?  This is an undefined part of your definition.  Yes, that may be the aggregate effect, but we need to know (the users, and the sites, need to know) is 'this single possible action by a site' within the definition of tracking or not?
> 
> The user's browsing activity is observed across multiple distinct
> contexts.  It means that observing the user's activity only within
> a single context is not tracking.  

So, concretely, a hidden third-party tracker on a page can remember that you visited that page, or not?  If not, can it remember the nature of the site you visited (it was a guns and ammo kind of site)?  When you made the transaction?  Your IP address, geolocation, local time of day, user-agent, …?

This seems to permit the accumulation, by third parties, of a lot of data about the user, and I am unsure if that's your intent, or it's accidental, or a misread on my part.

> The reason it is there is because
> the verb tracking and the privacy concern we are trying to address
> are both about identifying the trail of an individual as they
> proceed from place to place.  Specifically, remembering that a
> person was at a single place is not tracking unless that memory
> is shared with someone else or combined with memories of other
> places.

But the next and subsequent times I visit a site that has the same third-party tracker on it, and they are allowed to remember some data that's associated with me, how is it NOT forming a trail?

> 
> "who" is doing the tracking is not important with this definition,
> as one would expect if they were looking up the term in a dictionary.
> 

I agree;  that's why I think first-party recording is, in fact, tracking, though we say that by choosing to visit a site, you also give it (and its service providers) implicit permission to track you.

> 
> I've gone back and forth regarding whether the definition of tracking
> ought to exclude data derived from the observed activity after it has
> been de-identified (i.e., no longer associated with a particular user).

I think that if data can no longer be tied back to a user, then it's no longer our concern.  (And I mean that the ability to tie-back no longer exists, not that we rely on something being kept secret or separate).

In general, I am looking for ways to move things firmly off our table, and then to categorize and set rules for each category, the data that remains.

'In transaction' -- not our business
'truly de-identified' -- not our business
and so on.

On one other comment on the thread:

On Oct 16, 2013, at 9:05 , Walter van Holst <walter.van.holst@xs4all.nl> wrote:

> On 16/10/2013 17:52, Shane M Wiley wrote:
> 
>> I likewise echo the positive feelings.
>> 
>> On your first point, I understand there is capacity to cover a
>> broader scope but would recommend we limit v1 to "browsing activity".
>> This is the lion share of the issue at hand and I believe builds a
>> good launching point for becoming more granular over time.
> 
> While I agree with browsers as our first and foremost priority, I'd
> rather have the standard worded in such a way that it does not exclude
> other UAs a priori. 

(and in response to David Wainberg)

I think we long ago considered adding a sentence that during the design of this first version, we primarily considered general-purpose browsers, and that other online activity received less or no attention, and the specification may not be obviously applicable to those other activities.  I think this would be better than tying ourselves in knots trying to define 'browser' or insert it into all our definitions etc., and gives us more material for the V2 we all so eagerly want to work on. :-)

[I would certainly want to think hard about HTML email, for example, and we have not]

David Singer
Multimedia and Software Standards, Apple Inc.
Received on Wednesday, 16 October 2013 22:31:09 UTC