Re: ISSUE-5: Consensus definition of "tracking" for the intro? from Roy T. Fielding on 2013-10-18 (public-tracking@w3.org from October 2013)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Thu, 17 Oct 2013 17:33:09 -0700
To: David Singer <singer@apple.com>
Cc: "public-tracking@w3.org (public-tracking@w3.org)" <public-tracking@w3.org>
Message-Id: <7A3E2A97-DD60-4AE9-931B-3CEA1AE09AD9@gbiv.com>
On Oct 16, 2013, at 3:30 PM, David Singer wrote:

> Roy, thanks, this is helpful.
> 
> I think there are two problems, both addressable:
> 
> a) if something is left out of 'tracking data' then we implicitly lose our ability to say something about it.

No, we don't -- the definition covers both the act of tracking
and any data derived from that tracking (profiles).

>  So I guess we disagree here, but it's probably something we can manage.  Since both first parties and service providers have restrictions on what they can do with retained data, I think that that retained data is tracking data.  You don't.  I also don't think that a change of party changes the *nature* of the data, though it does change the requirements on how it's handled.  I think a user 'implicitly' gives a site they *choose* to visit, permission to track them (hence the first party carve-out), and that extends, of course to their service providers.

More importantly, the user does not consider posting their own pictures
to Picr to be tracking, but do consider it to be tracking when their
Picr user profile is used to add annotations to pages on other sites
(even if they explicitly gave Picr permission to do so).  The first
is *not* tracking.  The second is consent to track.

> b) I fear that the definition will get unmanageable, unreadable, if we include in it all the permissions and carve-outs.

The sum total of my definition is presented in the proposal.  I don't
need to include carve-outs because those are not tracking under
my definition.  I don't need to include permissions because those
are tracking (and hence are included in the definition).  The only
unfinished part of that definition is determining the exact boundaries
of a given context, which I personally think is going to be some
combination of same-(branding + controller + policy) in order to
match user expectations and the tracking status response.

> I think it's cleaner to do something similar to what we did in the draft 'user' page -- set a rough stage, and then refine it.  That also matches what Matthias suggests.

That is not going to happen.  I am not interested in a rough or vague
definition followed by a bunch of overreaching requirements that have
nothing to do with tracking.  I would be fine with a preface that says
"For the purposes of this protocol[ or specification], ...".

I think it would help immensely if we abandon the preconceptions
about tracking being a per-request decision.  It isn't necessary
for a site to determine from the request whether it is or is not tracking.
It cannot do so, in general, since it cannot know how the data it receives
will be processed long after the request/response is done.  DNT is
better thought of as an instruction to anyone who receives the request
data --- an instruction that must be obeyed by a compliant party
for as long as it has access to that data in any identifiable form.

> I hear what you say about sharing; indeed, if I don't retain I can't share, but if retain starts after the transaction end, this leaves a loop-hole during the transaction.  This is addressable.
> 
> Responses below, but trying again.  (And thanks for the dialog).
> 
> Iterated draft:
> 
> * * * *
> 
> In general terms, Tracking is the retention or use after a network transaction is complete, or sharing, of data that is, or can be, associated with a specific user, user agent, or device. 
> 
> However, this recommendation recognizes that by choosing to visit a site, users allow First Parties to retain and use tracking data they collect directly, or indirectly via Service Providers (though there are restrictions on sharing); and it allows Third Parties to claim permission to retain tracking data under some specific conditions (e.g. for security, auditing, or for deferred processing of raw data).
> 
> * * * *

I am still not interested in anything that says first-party retention
of personal data is tracking in the sense meant by DNT.  That is not
what the user is asking us not to do.

> On Oct 16, 2013, at 1:52 , Roy T. Fielding <fielding@gbiv.com> wrote:
> 
>> 2) data is often retained "after a network transaction is complete"
>>    just for the sake of caching
> 
> do you mean a 'raw data permission' here, or something else?

No permission necessary.  Cached responses (at possibly multiple
levels of the stack, including intermediaries like Varnish and
Traffic Server) is data held at a single location after a given
network interaction is complete that is not used for tracking
but might be unique to a particular user agent.

>> Tracking is the observation of a particular user's browsing activity
>> across multiple distinct contexts and the retention, use, or sharing
>> of data derived from that activity outside the context in which it
>> occurred.
>> 
>> does not limit the definition to a particular software or role,
>> a specific number of protocol interactions or requests, or a form
>> of data that remains associated with a particular user.
> 
> OK, but it allows each context to remember a lot (notably, not just the first parties), which is problematic, I think.  See below for questions.
> 
>>>> You have not indicated that there is anything wrong with my proposal.
>>> 
>>> 1. Who is doing what 'across multiple distinct contexts'?  This is an undefined part of your definition.  Yes, that may be the aggregate effect, but we need to know (the users, and the sites, need to know) is 'this single possible action by a site' within the definition of tracking or not?
>> 
>> The user's browsing activity is observed across multiple distinct
>> contexts.  It means that observing the user's activity only within
>> a single context is not tracking.  
> 
> So, concretely, a hidden third-party tracker on a page can remember that you visited that page, or not?  If not, can it remember the nature of the site you visited (it was a guns and ammo kind of site)?  When you made the transaction?  Your IP address, geolocation, local time of day, user-agent, …?

All of that data is user activity in the first party context.  If the
third-party tracker observes it, then any of the following will cause
it to be tracking under this definition:

  1) the third party observes the user's browsing activity in any
     other context, including one where it is the first party;

  2) the data is provided to anyone other than the first party and
     they combine it with observations obtained from any other context.

This is analogous to walking down the street, seeing a person with
an unusual t-shirt, saying Hi, and continuing on with your walk.
If you don't see that person again (or at least don't recognize
them in a different shirt), then it cannot be tracking.  If you
do see them again, at the same location, then it still isn't tracking.
If, however, you see and recognize them again in a different location
and choose to remember that fact, then you have tracked them.

> This seems to permit the accumulation, by third parties, of a lot of data about the user, and I am unsure if that's your intent, or it's accidental, or a misread on my part.

Yes, a third party can learn the data provided by the user agent in
a specific context.  The immediate example of that is contextual
advertising, which we already agreed is not tracking.

Note, however, that all of your examples assume that they also know
who "you" is.  Why do you think the third party would know that
information? If they are relying on any other information, from any
other source, that has the effect of identifying you, then they are
already tracking according to that definition.

>> The reason it is there is because
>> the verb tracking and the privacy concern we are trying to address
>> are both about identifying the trail of an individual as they
>> proceed from place to place.  Specifically, remembering that a
>> person was at a single place is not tracking unless that memory
>> is shared with someone else or combined with memories of other
>> places.
> 
> But the next and subsequent times I visit a site that has the same third-party tracker on it, and they are allowed to remember some data that's associated with me, how is it NOT forming a trail?

Because it is the same context.  The fact that a given user agent
visited the same site more than once is not a privacy concern if
the third party doesn't know anything else about the user.

>> "who" is doing the tracking is not important with this definition,
>> as one would expect if they were looking up the term in a dictionary.
>> 
> 
> I agree;  that's why I think first-party recording is, in fact, tracking, though we say that by choosing to visit a site, you also give it (and its service providers) implicit permission to track you.

I'd rather think of it the same way any normal person would
think of it -- I am not being tracked if I am never seen in
more than one location.  I am being remembered, sure, but there
is a good reason why we have a separate word for being tracked.

If I walk into my favorite bar and everyone knows my name, I am
not going to be creeped out; that's good customer service.  If I
get the same reaction from a place I've never been before, then
I would find that creepy (even if it was with good intentions).

>> I've gone back and forth regarding whether the definition of tracking
>> ought to exclude data derived from the observed activity after it has
>> been de-identified (i.e., no longer associated with a particular user).
> 
> I think that if data can no longer be tied back to a user, then it's no longer our concern.  (And I mean that the ability to tie-back no longer exists, not that we rely on something being kept secret or separate).

Yes, but that is irrelevant!  Personal data can, by definition, be
tied back to a user.

People deliberately use the Internet in ways that allow most
first party site and app providers to retain personal data.
That is what they want.  DNT does not turn that off.  It isn't
tracking unless someone tries to combine observations from multiple
sites, at which point the DNT signal still applies.

> In general, I am looking for ways to move things firmly off our table, and then to categorize and set rules for each category, the data that remains.
> 
> 'In transaction' -- not our business
> 'truly de-identified' -- not our business
> and so on.

Yep, 'Not tracking' --- not our business.

It is important to understand that my definition is not limited to a
single network interaction or any single point in time.  It isn't even
limited to receiving (or not receiving) DNT.  It also deals with
tracking that we haven't even talked about, such as post-mortem
analysis of logfile data from multiple sites, or two independent
first parties that later merge into a single party [e.g., under
my definition, their data was collected in different contexts and
thus combining or correlating that past data would be tracking].

....Roy
Received on Friday, 18 October 2013 00:33:19 UTC