Re: [ISSUE-81, ACTION-13] Response Header Format from Roy T. Fielding on 2011-10-21 (public-tracking@w3.org from October 2011)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Fri, 21 Oct 2011 12:39:19 -0700
To: David Singer <singer@apple.com>
Cc: "public-tracking@w3.org Group WG" <public-tracking@w3.org>
Message-Id: <0BBEE041-2B5A-42DE-8F9C-2B5E0A061DD1@gbiv.com>
On Oct 21, 2011, at 10:44 AM, David Singer wrote:
> On Oct 20, 2011, at 17:25 , Roy T. Fielding wrote:
> 
>> On Oct 20, 2011, at 11:17 AM, David Singer wrote:
>>> I think you are allowing your pessimism to run too far. Strictly, logging out means I can't do anything I'd need to log in to do; it doesn't strictly mean 'forget me'.  But if a site responds "I am not tracking you in this transaction" and it later transpires that it was, that's pretty useful.
>> 
>> DNT does not mean "forget me".  If the server responds positively to DNT,
>> it means that it won't track the user beyond its own branded sites
>> (and presumably won't share the internal data collection with third
>> parties unless the user requested it for some other reason, like by
>> purchasing something with a credit card).  Please do not confuse DNT
>> with private browsing mode.  Whatever a server might say in response,
>> it won't be understandable without a full policy description.
>> 
>> And the response is not just a few bytes.  It is a few bytes for every
>> single resource for which we indicate a response is needed, every time
>> those resources are accessed.  A typical site embeds dozens of such
>> requests per page.
>> 
>> In contrast, a well-known location can represent exactly how the
>> site as a whole tracks, provide information specific to that user
>> (such as a link to where they can see and edit the data collected),
>> only needs to be requested once per site, and only by those
>> browsers specifically configured to do so.  It thus has no performance
>> impact whatsoever and does not require any modification to the
>> existing code that implements all of today's operating websites.
>> 
> 
> I think you are saying two things here I disagree with, but I am going to re-state them in case I misunderstand you, and then if I have, the disagreement can be ignored.
> A - it's OK to add a DNT header to every request, but it's unacceptable overhead to add it to every response.
> B - the definition of 'track' and specifically 'not track' may vary site-by-site, so what it means will have to be expressed in a privacy policy on each site, and a response header cannot capture those nuances.
> C - it's OK for users and user-agents to make changes to handle DNT, but not acceptable for those (the sites) who are tracking and benefiting from tracking.
> 
> 
> A seems strange; requests are generally smallish, whereas responses are typically reasonably large.  "DNT: 1,xxxxx" is quite a bit smaller than the average URL in a request, and much smaller than any HTML fragment, image, etc. that might be in a response.  Even 1x1.gif is 43 bytes according to wikipedia.

The header field is not added to every request.  It is only added to
the requests of those users who have specifically enabled DNT.
Hence, the cost is borne by those who want DNT.

If the response header field has to be present on all responses
containing DNT, regardless of whether that response has anything
to do with third-party tracking, then it is a huge performance
cost for an extremely small benefit.  That's because it would have
to be sent on every response, whether the client requested it or not,
in order to avoid breaking shared caching.

Whether or not it is "small" has not yet been decided, but let's
assume it is the minimum proposed (8 bytes).  Those 8 bytes are in
the critical path -- they have to be read before the client can
start rendering whatever is in the body.  Most of the time, such
an addition would just be noise (dumb sites include junk in the
Server field for no reason already).  High performance sites do
care a great deal because of the interactions with TCP max segment
size and slow start: the big latency hit occurs when the message
boundary exceeds 1072 octets.  Many hours are spent trying to
keep response messages (including cookies) below that number,
or at least below a small multiple of that number.

If the response is different for each request, then we have added
both the cost of cache-busting (for those responses that would
normally be cacheable) and the server time spent trying to figure
out what the answer should be for that user.  Again, this is in the
critical path, and it would have to be done on every request that
implements custom responses to DNT.

If this feature seriously improved safety (life, limb, or property),
then it would certainly make sense for everyone to feel the pain.
DNT is not such a feature.  If we want sites to deploy it, then the
costs have to be reasonably close to the benefits.

> B will take a little more to discuss.  

I must not have been clear.  I wasn't saying that it means
different things to different sites.  I was saying that the
scope of "I am not tracking you across differently branded
sites" is so vague that anyone attempting to verify it will
have to look at a detailed policy that describes what sites
are in the brand, what third-party sites are contracted as
first-party data collectors, and how the data collected is
used within those sites.  Hence, the simple "yes" or "no"
answer is just a pacifier.  People who want to verify DNT
will have to look at the details anyway.  Everyone else doesn't
need an answer from the server.

The mechanisms by which a first party might enable tracking
or sharing of data between sites are varied and not necessarily
instantaneous, and could involve many different opt-out or
opt-back-in schemes beyond those specified here.  Moreover, the
tracking might be in the form of javascript running on the page
delivered by the first party, and that javascript may or may not
be respecting the DNT flag if it is reflected in the DOM.  So,
having the first party server respond "yes" or "no" for each
request is meaningless. Tracking is not a request/response attribute.
It is a site-wide policy.  We can make such a policy machine-readable
and efficient as a well-known URI, because it only needs to be
requested by those clients actively verifying DNT.  Hence, on
average, that would be a zero-cost implementation of the same goals
that the response header field is intended to solve.

> I used to be involved in Rights Expression Languages, where the debate was essentially about whether a formal language could capture all the nuance of what might vary in the rights being transferred in (for the most part) a purchase transaction.  What was not discussed was whether the average consumer was prepared to have the rights acquired vary both in space (between different sites) and in time (different purchases on the same site).  The answer seems to be 'no', that when consumers think they have 'bought something' they want a pretty static and universal understanding of what they have bought.
> 
> Now, in the case of Rights Expressions, we're talking about sites that (a) the user chose to visit, and (b) with which they chose to have a transaction.
> 
> In the case of DNT, for 3rd-party sites, neither of these are true.  I therefore believe that it is paramount that what 'no track' means be something that is defined in the specification we are writing, and does NOT vary from site to site or over time within a given site.  If sites are at liberty to define 'not track' however they please, I think we will have achieved nothing.

Yes, but now you are only talking about responses from third-party sites
acting as data collectors for the purpose of tracking.  I have no objection
to a response header field in only those responses.  That is a scope which
corresponds to the benefits of DNT.

> C - it seems to me that the work in building the databases, correlation engines, input processing (e.g. from the URL and first party) and so on, that are involved in tracking users is vastly larger than processing the DNT header, and responding with a clear and accurate indication of what the site is doing in response.

Again, that argument only makes sense if the response is limited
to the tracking objects.

Is it a reasonable compromise to define a well-known location for
a machine-readbale first-party tracking policy (defining whether
it implements/respects DNT, the scope of what it considers to be
first-party, where to opt-back-in or edit data collected, etc.)
and only send a header field response to DNT from third-party
sites when they are being used to perform dynamic third-party
tracking?

....Roy
Received on Friday, 21 October 2011 19:39:44 UTC