Re: tracking-ISSUE-260: method for validating DNT signal from user [TPE Last Call] from Roy T. Fielding on 2014-12-16 (public-tracking@w3.org from December 2014)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Tue, 16 Dec 2014 01:42:03 -0800
To: Tracking Protection Working Group <public-tracking@w3.org>
Message-Id: <09582709-1546-4A28-B4B8-761B8F64946F@gbiv.com>
This is an editor's response to the last call comments associated with
ISSUE-260.

HTTP is a declarative protocol that crosses organizational boundaries.
What this means, in general, is that both sender and recipient rely on
a shared understanding of protocol semantics in order to communicate.
There is no proof that the sender is being honest.  This has not prevented
such protocols, especially HTTP, from having a defined set of semantics
that can be used to communicate effectively.

In practice, controlled testing is a sufficient method for validating the
correctness of software semantics.  A received signal is presumed valid
until someone observes that it isn't, at which point people go about
inspecting the software and determining why.  When it is determined to be
a software fault, it is reported as a bug and recipients are free to
work around the faulty signals generated by that software.

Any software that often sends an invalid signal can expect that signal
to be ignored, as per HTTP/1.1 [RFC7230, sec. 2.5]:

  http://tools.ietf.org/html/rfc7230#section-2.5

Existing servers have numerous mechanisms for inspecting requests and
adjusting their own interpretation of the request based on known bugs
in the sender.  The most common example is pattern matching on the
User-Agent string.

Regardless, sending additional "I really mean it!" data along with the
signal is a well-known anti-pattern.  If the sender is unable or
unwilling to ensure that the original signal is valid, sending more
signal is not going to help.  The additional data gets copied and
incorrectly sent by the same bad implementations, becoming a useless
burden for correct implementations and no help whatsoever for validation.

Therefore, I am marking this issue as WONTFIX.  I'll respond to the
individual comments below.

On Sep 22, 2014, at 2:29 PM, Roy T. Fielding wrote:
> On Jul 12, 2014, at 7:04 PM, Tracking Protection Working Group Issue Tracker wrote:
> 
>> tracking-ISSUE-260: method for validating DNT signal from user [TPE Last Call]
>> 
>> http://www.w3.org/2011/tracking-protection/track/issues/260
>> 
>> Raised by: Jack Hobaugh
>> On product: TPE Last Call
>> 
>> http://lists.w3.org/Archives/Public/public-tracking-comments/2014Jun/0005.html (Comment #2. Also present in some form in comments of Alan, Peter, Brooks, Chris Mejia, David Wainberg, Max, Vivek, Ari, Tim, Mike Zaneis.)
>> 
>> 
>> The technical approach of the TPE lacks a method by which the origin of the DNT signal can be validated to ensure that the signal was set as the result of an informed user choice. The stated goal of the TPE protocol “is to allow a user to express their personal preference . . . .” “The basic principle is that a tracking preference expression is only transmitted when it reflects a deliberate choice by the user. In the absence of user choice, there is no tracking preference expressed.” (TPE Section 4). NAI agrees with this stated principle but the TPE does not provide the necessary requirements for enforcing this principle within the protocol or for determining a rogue DNT signal. Without a locked down DNT signal, the server cannot determine whether the DNT signal is a valid signal. NAI respectfully requests that this issue be addressed before moving forward with the TPE.

The method of validating a signal is to install the software and test it,
observing the user agent configuration, DNT behavior, and outgoing messages.
There is no need to enforce the principle within the protocol, since these
requirements are on each sender.  Hence, one merely installs the software
and observes the bytes being sent.

Note that these requirements are not intended for remote enforcement.
They are instructions to the implementer.  If there are bugs in deployed
software, recipients can choose to work around those bugs regardless
of the received signal.

> For completeness, here are the other last call comments applicable to this issue:
> 
> ===
> Rachel Glasser
> http://lists.w3.org/Archives/Public/public-tracking-comments/2014Jun/0001.html
> 
> The TPE is designed to be express user's choice preference regarding tracking. However, the protocol lacks a method to identify and validate the origin of the signal, which means other variables, (for example routers, antivirus software, browser plugins) may all insert a DNT signal. This signal inserted by this variable is not necessarily reflective of the user's preference. Furthermore, the only companies that would have to honor DNT are those who do not currently have any information about you in the first place. As such, users will have a very difficult time understanding when DNT applies. This will create confusion when users attempt to manage their privacy and exercise choice when it comes to data collection.

Testing the Navigator.doNotTrack property might help to determine
whether the preference known to the UA is consistent with what is
received on the wire, but that doesn't imply the signal is invalid:
TPE allows a signal to be set by intermediaries that are under control
of that user, such as privacy proxies.

There is nothing in TPE that prevents a server from informing a user when
it receives a tracking preference, using whatever UI it might want to
distinguish the three cases.  As such, user confusion is not considered to
be a relevant concern of the protocol.  It might very well be a concern for
the user agent's configuration UI, but that has been deemed out of scope.

When software incorrectly implements DNT, we expect people to notice and
report their findings.  A mechanism has been provided to indicate when a
signal is being disregarded because it has been incorrectly implemented.
There is no need for correctness to be determined on the fly.

> ===
> Alan Chapell
> http://lists.w3.org/Archives/Public/public-tracking-comments/2014Jun/0003.html
> 
> The TPE does little to ensure the validity of a DNT signal.
> Imagine attempting to board a plane at an airport where anyone with a
> computer could instantly alter flight patterns; imagine a marketplace where
> credit card companies were unable to authenticate their cardholders; or
> think about driving a car where anyone could use a police siren to push
> their way through traffic on the city streets. We sometimes take for granted
> how important it is to trust the validity of signals in life. If you can't
> trust the signal, the entire framework is left open to question.
> And that's exactly where we are with DNT. Per the TPE, there's no
> requirement on user agents to ensure that the DNT signal is valid. And as a
> result, there's no mechanism for anyone in the digital media ecosystem to
> trust any DNT signal they receive. One of the largest browser manufacturers
> has already been reported to have violated the spirit of the TPE - so this
> isn't mere speculation. And then there are any number of plugins, routers,
> anti-virus software and other entities that are turning on DNT without the
> user's knowledge.

As noted, violations of the protocol can be easily observed, reported,
and dealt with by disregarding the signal.  There is no need for the
protocol to ensure validity, just as there is no need for a car to ensure
that no siren has been installed, airplanes to ensure that traffic
control hasn't been hacked, or marketplaces to ensure that the person
using a credit card actually owns it.  Other people ensure those things,
usually when exceptions become apparent.

> ===
> Peter B Kosmala (American Association of Advertising Agencies (4A's))
> http://lists.w3.org/Archives/Public/public-tracking-comments/2014Jun/0007.html
> 
> The specification lacks a reliable method for validating and ensuring that each DNT signal is set as the result of an actual, informed choice by the end user. This means that routers, antivirus software, browser plugins, proxies, or ISPs, can all insert a DNT signal into the browser’s HTTP request, and the recipient server has no way of knowing whether it reflects the user’s choice or that of another entity entirely. That is not an authentic expression of consumer preference. Instead, it will create confusion when users attempt to manage their privacy and exercise their privacy choices relating to data collection.

Ditto.

It is not possible for a protocol to read minds.  What can be determined
is if the software behaves as expected when installed by someone who
knows their own mind and how to test the DNT features.

> ===
> Brooks Dobbs
> http://lists.w3.org/Archives/Public/public-tracking-comments/2014Jun/0009.html
> 
> Chairs have communicated that W3C rules dictate that any MUSTs used in the specification must be testable. The primary goal of this specification is to communicate a user's preference with respect to Tracking. Unfortunately, the specification in its current form allows for user preference signals that are not realistically testable. While it is true that a UA may test the setting it maintains internally, it cannot test the preference received by an origin server, nor can the origin server test if the signal it received is in keeping with the actual preference of the user or even the preference recorded by the UA. Current market implementations show this to be beyond a hypothetical problem. The middle man alteration of signals in the market today and the failure for their to be a technical means for either party to have the ability to verify a common understanding of user preference is a fundamental flaw.

The requirements only need to be testable by someone with an implementation
in hand, usually one that they created themselves with the intent of being
compliant with those requirements.  It is not necessary for a requirement
on a sender to be testable by remote recipients.

> In addition to the injection of signals by the intermediaries, the TPE’s lack of more specific guidance to the UAs with respect to how to ascertain a user’s preference also makes testing that preference against the protections offered by any individual compliance regime nearly impossible. End users are unlikely to be aware of the complicated definition of “Tracking”, its exceptions (which may vary by compliance regime) and its scope with respect to covered parties. Where it is likely that users will have wide ranging expectations of what a choice means, testing any given signal’s meaning with respect to a given compliance regime may not be possible.

There is no evidence to suggest that there is difficulty in ascertaining
whether a given implementation does or does not adhere to the requirements
in the protocol.  We are not limited to testing those requirements as a
remote observer.

> ===
> Chris Mejia
> http://lists.w3.org/Archives/Public/public-tracking-comments/2014Jun/0010.html
> 
> 1. Entities receiving the DNT:1 signal cannot rely on its validity. Because
> there is no strict requirement in the TPE on how and when user agents
> set/send DNT signals on behalf on properly informed users, the signal itself
> cannot be relied on as a consistent message (an expression of individual
> user choice) to those who receive it.

The requirements in TPE are sufficient for testing an implementation.

> We have already seen examples of DNT:1
> being sent by default, where users did not take an affirmative action to
> enable its sending, and where in most cases, the user had no idea it was
> being sent. this is simply unacceptable for a standard that proposes user
> choice as one of it's core tenants.

It isn't allowed by the specification and that signal is consistently
disregarded by recipients.

> In order for this specification to be
> successfully adopted, entities receiving the signal must have confidence
> that the signal received represents an individual user's properly informed
> choice. This requires an educational component; users must be informed
> (transparency), and that educational component must be validated by the
> specification so that those receiving the signal can differentiate when it's
> been set/sent appropriately, vs. the the "noise" created by user agents that
> insist on sending it by default for all of their users. Furthermore, because
> there is no real cost for user agents and intermediaries to "turn-on" DNT:1,
> we can see that it's being insincerely deployed (under the guise of user
> protection) as a competitive tort in commercial competition wars, rather
> than as a functional tool for individual user choice. By allowing unfettered
> flooding of un-checked DNT signals into the wild, the actual user-set/sent
> DNT signals will become effectively lost signals in the noise of
> machine-generated signals; this practice of bastardizing the signal does not
> help to advance user privacy controls, and is thus inconsistent with the
> TPWG Charter.

The protocol does not prevent people from abusing it.  People do.
If it fails to be deployed correctly in practice, then it will be
disregarded in practice.  This has nothing to do with how the protocol
has been specified.

> ===
> David Wainberg
> http://lists.w3.org/Archives/Public/public-tracking-comments/2014Jun/0011.html
> 
> The signal cannot be verified.
> 
> It is non-controversial that for a signal to be valid it must reflect a 
> user's informed and explicit choice. However, the TPE provides no 
> mechanism for a recipient server of a DNT:1 signal to ensure that a 
> signal is valid. Experience already demonstrates that there will be a 
> high rate of invalid signals as a result of the signals being set by 
> default or injected by intermediaries. The high rate of invalid signals, 
> with no means to distinguish them, will pollute the space, undermine the 
> meaning of the signal, and make it impossible for implementers to 
> support the specification.

If the signals are mostly invalid, there will be no pressure to honor them.
There is no need for a voluntary protocol to enforce its own requirements.

> The W3C, the working group chairs, and the primary authors of the 
> specification are indifferent, and seemingly willing to accept of a high 
> rate of invalid signals, regardless of the source or user intent, 
> regardless of the business impact, and regardless of the overall 
> negative impact on the Internet.

No, these issues have been discussed many times and we have determined
that invalid signals can be detected by testing the software in question,
that industry has more than sufficient resources to perform such testing,
and that we are better served by clearly and publicly disregarding
broken signals in order to encourage standards compliance.

> ===
> Max Ochoa (Turn, Inc.)
> http://lists.w3.org/Archives/Public/public-tracking-comments/2014Jun/0012.html
> 
> The TPE does not guarantee that the do not track (DNT) preference is that of the user. It is impossible to discern if the DNT state was set by the user or by an intermediary (e.g., plug-in integrated into browser, separate software, operating system, ISP or wifi provider, home routers). Without this guarantee, the entire framework fails.

There is no need for a guarantee.

> a. If an intermediary alters the preference originally set by the user (e.g., from 0 to 1, or 1 to 0), how can downstream recipient servers know?

If the intermediary is statistically significant, its behavior will be
detected by observation.  If you want to make sure, ask the user.

> b. In the case of conflicting preferences between multiple user agents (e.g., toolbar plug-in + browser), which preference wins?

There is only one signal.

> c. In the case of conflicting preferences between multiple user agents on the same device (e.g., browser_1 + browser_2), which preference wins for information collected at the device level?

That is not relevant -- the signal is specific to each request.

> ===
> Tim Stoute (eyeReturn Marketing)
> http://lists.w3.org/Archives/Public/public-tracking-comments/2014Jun/0014.html
> 
> There are technical hurdles associated with the DNT proposal for which
> there are no enshrined workarounds in the proposal. For example, since the
> DNT HTTP header value can be set by network devices and software, it
> therefore does not directly reflect users choice. For example, in the case
> of a proxy server setting the DNT signal, there could be hundreds or
> thousands of individuals behind the equipment, which is broadcasting a
> signal that none of them explicitly chose.

Such a proxy would not conform to the requirements in TPE.  Adding more
requirements would not make it more conformant.

> ===
> Mike Zaneis (IAB)
> http://lists.w3.org/Archives/Public/public-tracking-comments/2014Jun/0015.html
> 
> * The origin and validity of the signal cannot be confirmed, thus putting in doubt whether a consumer actually chose to turn it on or whether a company has made that decision for them.
> * Legitimate DNT flags should reflect a user's choice to affirmatively turn on that signal. However, the TPE provides no means to ensure who turned on the signal and what point in the supply chain.
> * We have already seen extensive "gaming" of the DNT:1 signal, as it is sent by default by some routers, plugins, and other intermediaries that have access to the setting or the HTTP headers.
> * There is essentially no cost for intermediaries to turn on the DNT signal, thus companies can utilize this practice for their own profit motive to be seen as "competing on privacy". The signal can quickly proliferate without ever being set by consumers.

Ditto.  TPE explicitly provides for that case by enabling a server to
communicate when a signal is being disregarded.  Since there is no value
in sending disregarded signals, such "competing on privacy" will fail.

> ===
> JoAnn C. Covington (Rocket Fuel Inc.)
> http://lists.w3.org/Archives/Public/public-tracking-comments/2014Jun/0016.html
> 
> The specification also contains no mechanism to ensure that DNT signals
> actually do reflect consumer choice. There is no mechanism to prevent
> multiple contradictory signals from being sent, and no means to identify
> whether a DNT signal was set by someone other than the consumer. Thus, it
> is impossible for service providers receiving the signal to know whether
> the signal reflects informed consumer choice. Under the TPE, a DNT signal
> may be communicated by a browser, a browser plugin, router or other piece
> of software that automatically sets or communicates a DNT signal without
> consumers’ knowledge or consent. These signals may be set by vendors for
> their own competitive purposes and have nothing to do with an expression of
> consumer choice. Thus, the TPE provides multiple avenues for abuse of Do
> Not Track browser settings without serving, and even to the detriment of,
> consumer interests.

Ditto.

> ===
> Vivek Narayanadas (The Rubicon Project, Inc.)
> http://lists.w3.org/Archives/Public/public-tracking-comments/2014Jun/0019.html
> 
> the TPE provides no way for responding servers to confirm that a received DNT signal was actually set by the user agent. The lack of any available authenticating mechanism means that a responding server must respond to a DNT signal blind, simply assuming such a signal was intentionally sent by the end user. Such a result is at odds with the stated intent of the TPE, which is to empower end users (and not other third parties) to informedly state a preference as to tracking.
> 
> Without any authentication mechanism, intermediaries in the data stream between the user agent and the responding server have the ability and incentive to insert themselves into the data stream and state a preference purportedly on behalf of a user agent. ISPs, routers, add-ons, etc. have incentive to change all DNT signals to “1” in order to position themselves in their respective marketplaces as more “privacy-friendly” regardless of whether the user is even aware of the third party’s practice, and regardless of the user’s actual tracking preferences.

If such abuse is statistically significant, it will be detected and
reported on by those attempting to adhere to the standard.  The result
will be more disregarded signals.

> Rubicon Project requests that the Working Group add some authentication mechanism to the TPE—for example, by requiring the use certificates to confirm the user agent’s DNT selection, or a central repository storing user agent preferences—to ensure that responding servers honor the end user’s actual preferences, rather than the skewed preferences of third parties trying to game the system to their benefit. Such an authenticating mechanism will also allow third parties receiving a DNT signal to ensure that the actual signal-setting agent is properly presenting the end user his or her choices, in accordance with the TPE, rather than making the decision unilaterally for the end user.

It would be absurd to require authentication (user identification)
in order to express a preference for NO TRACKING.  Almost as absurd as
requiring all users that don't want to be tracked to be registered with
a central tracking service.  Any such additional data sent with every
request would be both ridiculously inefficient and abused for fingerprinting.

> ===
> Nadine Stocklin (PubMatic)
> http://lists.w3.org/Archives/Public/public-tracking-comments/2014Jun/0020.html
> 
> Another concern is that, although the TPWG prohibits any intermediary or agent other than the user from setting a DNT preference expression, a party receiving a DNT signal is not able to discern who set the signal. Therefore, a DNT:1 signal set by an intermediary such as a router, proxy, or anti-virus software, may be honored when the user’s true preference is DNT:0. Thus, though the TPWG wishes that the signal reflect solely the user’s informed choice, the technical spec does not require that. PubMatic is concerned about the very real possibility that these intermediate services might see it as an advantage to advertise that they insert a DNT:1 signal on behalf of their users, as a type of privacy protection. In reality, this would be a disservice to users who have consciously chosen a preference expression, and would impede the ability of servers to accurately read users’ preferences.

TPE requires that the preference be under control of the user.
An intermediary outside of the user's control cannot add or modify the
signal without violating those requirements, which will lead to the
signal being disregarded.

> ===
> Shane M Wiley (Yahoo!)
> http://lists.w3.org/Archives/Public/public-tracking/2014Feb/0045.html
> 
> Net new information to consider:  The only significant push back this request received was the increased page request header size that would result in "byte bloat" across the Internet.   After further discussion within industry I want to push back on that position and alter the language requested to help strengthen this perspective.  First, only those headers coming with DNT:1 would need to include this conditional field.  Second, only in those situations where the DNT:1 setter is something other than the current User Agent (already conveyed in the UA String) would this information be added.  If a 3rd party tool communicates its setting through the UA in a manner that the UA has validated, then this is not necessary (but nice-to-have so becomes a "MAY" in those cases).  Lastly, we believe less than 12 bytes on average would ever be used (simply include the domain/suffix of the setting party's site).  This layering of conditions places the percent volume increased much lower than originally discussed and those of us on the Server implementation side believe this is a critical enough issue to justify this additional byte load on the transaction.

I don't.  It does not solve the only thing you suggested needed solving,
namely that non-conforming senders are sending the signal.  If they are
already non-conforming, there is no reason for them to send anything other
than what the UA would have sent alone.  Additional requirements have no
influence over non-conforming implementations.

IMO, it is already a tragedy that we require 8 additional bytes for a
minimal DNT field.  To add more, you have to justify it with an actual
solution to the problem.  Those bytes are far more hazardous to request
processing than an entire world of non-conforming intermediaries, since
non-conformance will cure itself over time (or the entire protocol will
be abandoned as socially unworkable).

> Draft language addition to TPE:  (Section 4.2)
> 
> Option A:  Delimit the DNT-field-value with an optional element, or
> Option B:  Introduce an additional DNT-field-name = "DNTSET"
> 
> [Roy, I've been reviewing recent RFCs and have seen both approaches taken - is there a preferred approach?]

Option A is more efficient but requires the extension field, which we
already plan to mark at-risk.  Option B is easier for third party extensions.
Not sending anything is just as effective.

> <new text - normative>
> A user agent must send the DNTSET element only in cases where the DNT field-value is set to "1" (%x31) and the setter is not equal to the party provided in the HTTP "User-Agent" string for the network interaction.  In cases where the setter has been validated as part of the user agent's DNT approach this is not required but the setter may optionally pass the DNTSET field for greater transparency.  The DNTSET element should reflect the setter's parent HTTP domain.  The value should consist of valid URL characters and not exceed 30 characters.
> </new text>
> 
> <new text - non-normative>
> EXAMPLE 2
> GET /something/here HTTP/1.1
> Host: example.com
> DNT: 1
> DNTSET: privacytool.org
> </new text>
> 
> I'm hopeful we can appropriately consider this ahead of moving to Last Call as this is a long-standing issue that has been repeatedly brought up through-out this process only to be side-stepped at the very last moments in one of the rare occasions I've not been able to participate in our weekly call.

This suggestion is almost identical to the one you made in Sunnyvale.
There was no support for it because it does not in any way prevent a
non-conforming sender from further non-conforming to this requirement.

In HTTP, the user agent is the program that initiates the request,
including all plugins and extensions it might contain.  It is not a
party in any sense of how we use that term.  The User-Agent string
isn't even required to be unique -- it is often user-configurable.

In contrast, there is active work underway to minimize the observable
differences between client requests, thanks to existing abuse of
fingerprinting for tracking, so anything that requires privacy-concerned
extensions to self-identify their own installation is a non-starter.

....Roy
Received on Tuesday, 16 December 2014 09:42:27 UTC