Re: Supporting TPE on sites/subdomains where a user does not have control of the server (ISSUE 15, ISSUE 10) from Roy T. Fielding on 2017-02-04 (public-tracking@w3.org from February 2017)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Sat, 4 Feb 2017 08:26:19 -0800
To: Mike O'Neill <michael.oneill@baycloud.com>
Cc: public-tracking@w3.org
Message-Id: <5E112D4A-7350-494F-9261-61CA3F01BBF5@gbiv.com>
> On Feb 3, 2017, at 4:53 AM, Mike O'Neill <michael.oneill@baycloud.com> wrote:
> 
> Roy,
> 
>> The value in the header
>> field is generally not going to be C from a first-party site; it will be
>> N or T or "?".  C would be sent from a third-party resource that believes
>> it already has consent to perform tracking as we have defined it in our
>> specification (retaining identifiable data across multiple sites).
> 
> The TPE does not say this.

It doesn't have to.  You seem to be under the impression that the Tk header
field is the place where the user agent is going to get information about
tracking status.  In general, that is not the case.  Since the early days of
this WG, the protocol design has been constrained by WG requirements that
tracking status be available, whenever possible, *before* a trackable request
is made by a user agent, and that DNT must not significantly interfere with
HTTP caching aside from where caching is already unlikely.  The later
introduction of the Compliance array reaffirmed the need to get a TSR
in order to communicate a meaningful status.

Tk is sent after a request is received.  It is not useful for pre-flight
checks.  Likewise, if the value within Tk were to vary per user, that
would interfere with caching.

Hence, responses are primarily found in the tracking status resources.

TPE defines three ways to get a TSV (that 'T' or 'N' or 'C' value):

   1) GET /.well-known/dnt/

      This gets the site-wide (per-hostname) tracking status representation,
      which contains the TSV and compliance array and other bits.  This is
      sufficient for any site that only has one policy.  If consent has been
      obtained already and is accessible real-time, this TSV will be dynamically
      generated as 'C' (in the JSON) and it is unlikely that any Tk header
      field will be sent in responses.

   2) As a response header field "Tk"

      Tk is only required if the site-wide status is '?' (resource specific)
      or 'G' (gateway).  Resource-specific policies are much harder to manage
      and cannot be discovered pre-flight, so they are intentionally being
      avoided when possible.  In general, we should only see Tk header fields
      when an origin server (hostname) processes data on behalf of more than
      one controller, which is exceedingly rare these days for security
      reasons related to cookies.

   3) GET /.well-known/dnt/{status-id}

      This gets the request-specific tracking status representation (JSON),
      which can also be dynamically generated (if necessary).

> DNT is sent to all parties, who must respect it
> if compliance means anything. Why would first-parties not use the same
> mechanism to signal OOBC?

The protocol is indifferent to whether or not they have to respect DNT.
When they *use* TPE to communicate, they would use the same mechanism
to do so: the TSR.  Normal sites do not have to send Tk.

>> A site owner is unlikely to change the Tk response per user. That's too late.
> 
> No its not. Servers already commonly dynamically change content depending on
> request headers, e.g. accept-language, accept. The cookie header can signal
> different response for authenticated users etc.

Obviously.  It is too late FOR THE USER that is being tracked as a result
of sending their normal request to a normal resource.

> They can use the vary response header to tell browsers when to send a
> request etc. (except often for the cookie header which as you have pointed
> out, makes OOBC problematic anyway).
> Anyway you expect request-specific TSRs to change immediately no? A hosted
> site has to determine user consent itself anyway, and convey that using the
> status-id if it cannot do it in a Tk response header. 

The status-id is per-resource-owner-policy, not per-user.

>> For hosting providers that support multiple owners per origin server, the
>> primary TSR is going to have a status of "?" (dynamic) and each resource
>> is going to have a fixed status-id.  Any "C" response would have to be
>> fixed per resource as "C" for all users (provided in both the Tk header
>> field and the request-specific TSR) or provided only in the request-specific
>> header field after responding with "?" again in the Tk header field.

(er, paste-o ... I meant "request-specific tracking status resource",
not "request-specific header field").

> A C response for all users makes no sense. How can a site declare that all
> users have given consent? 

By not allowing access by users who have not consented.  E.g., almost all
walled gardens based on user account authentication.

>> A dynamic response of "Tk: C" or "Tk: N", selectively chosen per user, is
>> not allowed by TPE.
> 
> Nowhere does it say this, in fact the TCS says a C MUST be sent when data is
> shared with third-parties (transitive permissions).

I guess it is *possible* for the server to send "Tk: C;status-id" instead
of "Tk: ?;status-id", but I wouldn't suggest it.  Dynamic responses are
supposed to be in the TSRs.

>> For sites that do not track, their response is N at the primary tracking
>> status resource. They never send Tk at all.
> 
> They might, the TPE does not say they cannot.
> 
>> As usual, these mechanisms are not affected by EU regulations. The regulations
>> don't limit consent mechanisms to the one defined in TPE, and a C response
>> in TPE does not imply that the TPE exceptions API is being used to maintain
>> that consent.
> 
> EU law requires transparency and prior user consent (except for a limited
> number of exceptions). It will have a bearing on DNT when the JS API is not
> available and consent has been acquired the old fashioned way. How else
> would they be able to communicate that?

The same way that they asked for consent.  Controlled by the server.

>> Almost all hosting providers currently track users, based on our definitions,
>> just to maintain operations and provide the owner with referral data. AFAICT,
>> they are permitted to do so even under EU regulations. 
> 
> They are only  permitted to if the user has given consent, or the purpose
> for storage or processing is for one of the defined exceptions.

Those are the exceptions.

>> Hence, aside from
>> first party sites that are specifically configured to exclude normal tracking
>> (like Duck Duck Go), most sites will be responding with T or ? for their
>> HTML pages.  In any case, the Tk header field is only required when the
>> site-wide response is ? or G.
> 
> So it is required in those circumstances. I would claim It is also required
> for OOBC i.e. Tk: C. By its nature this has to be dynamic (it depends on the
> cookie header).

That claim is false.  JSON can be dynamically generated just as easily
as header fields, and only needs to be generated when requested.
Given that there will be billions more normal responses than TSR
requests, there is a significant advantage to never sending Tk at all,
let alone one that needs to be generated on the fly.

Keep in mind that no client ever needs a response once they know that a
site complies with a satisfactory specification of compliance: DNT works
from that point on because sites are compelled to adhere to regulations
and their own communicated privacy policies.  A site that lies about its
compliance is just as likely to lie in the tracking status.

>>> If a site has consent for tracking (presumably the state is encoded in a
>>> cookie) it must respond to that user with Tk: C. There would have to be an
>>> API not just a static config setting, and maybe this is supported on some
>>> hosting sites, but by no means all.
>> 
>> No, that's not even remotely true.  The TSV of "C" is communicated in either
>> the site-wide TSR or the request-specific TSR, both of which are separate
>> resources that don't need any response header fields.  The only time that
>> "C" would be in a Tk header field is if that response is the same for all
>> users (i.e., if access implies consent) and the site chooses to send that
>> in the field-value instead of sending "?".
> 
> How would a server communicate a request-specific TSR without a response
> header (that includes a status-id)?

They do -- they send a Tk field with a TSV of '?'.

> If the only option was a dynamically determined TSR (based on an incoming
> cookie header to the .well-known location) then this would mean an extra
> roundtrip for every request for the user agent to determine it. A Tk: C
> response is far more efficient.

What's important to users is the TSR, which will contain the C, and it
only has to be generated when requested (presumably by someone using an
active validation tool, since normal users don't care).  That's why a TSR
is far more efficient than sending a header field on every response.

>>> I know for a fact that this is difficult for many first-party sites, and
>>> not just Wordpress ones. Major multi-brand companies find this logistically
>>> difficult now, as I have explained many times. As DNT becomes more supported
>>> this will be less so, but for now we have a transition issue.
> 
>> I have no doubt that they find it difficult now, given the absence of
>> useful examples in the spec and written user guides.  Much of that has
>> been waiting on evidence of browser implementations.  Neither one is 
>> a protocol issue.
> 
> The problem already exists with Internet Explorer, Bouncer and
> PrivacyBadger.

Yes, with one blatantly non-conformant browser, your plugin, and the
EFF plugin that doesn't even implement TPE.

> The EFF policy assumes consent is signalled with Tk: C, and
> there has been discussion about using Tk: D as a signal to request
> PrivacyBadger to block subresources.

When the EFF gets around to implementing TPE, as specified, maybe
they will figure out why that isn't necessary.

>> Tracking is defined in the spec. That is the only tracking denoted by Tk,
>> and the mechanism used for tracking is irrelevant.
> 
>> The Tk response does not have any relevance to adherence to the EU regulations.
>> When they apply to a given site, adherence is legally required regardless of the
>> protocol or mechanisms used, even if the site doesn't implement TPE and doesn't
>> respond at all.  Our specs do not change that.
> 
> European law has a different take on tracking, as I explained when we were
> debating the definition. User consent is necessary for both access to user
> agent storage as well as for processing of personal data, and Do Not Track
> has been widely recognised as a means to communicate it (in the ePrivacy
> directive in recital 66 and A5(3), the proposed ePrivacy Regulation in
> A8,A9, A10,R22-24 and the GDPR in A6 and A21.5)

I have read them, Mike.  Did you notice that the ePrivacy Regulation uses
a different set of terminology than the GDPR?  I'd like to know why, given
the goal was to bring them into alignment.

Neither relies on the javascript API to implement a consent mechanism.
I don't know why folks got hung up on that.  It is far more effective to
send both "DNT: 1" (or nothing for EU) and a specific consent cookie,
rather than trying to tweak the value of DNT sent on every request.

> The rules about the Tk header and dynamic TSRs are far too complex and will
> probably never be properly implemented. The only realistic use for it is
> signal OOBC which will not be relevant when browsers support the API, IMO
> the sooner the better.

Umm, yeah.  As I said when it was originally proposed, the browsers are not
going to implement that API because it adds to the request latency of every
single request without any apparent advantage to the user.  I think we have
wasted enough time on that.  A cookies-based consent mechanism is what sites
have to implement today, so we should describe a standard way of doing that
(knowing full well that we sacrifice saved consents whenever the user resets
all of their cookies).  If successful, maybe browsers will implement a checkbox
for users to retain only dnt_consent low-entropy cookies.

....Roy
Received on Saturday, 4 February 2017 16:26:49 UTC