W3C home > Mailing lists > Public > public-tracking@w3.org > February 2012

ACTION-133 collect comparison criteria

From: Roy T. Fielding <fielding@gbiv.com>
Date: Sat, 25 Feb 2012 00:23:04 -0800
Cc: public-tracking@w3.org
Message-Id: <7D8DD347-4010-467C-A098-BCDEE3E2F516@gbiv.com>
To: Matthias Schunter <mts@zurich.ibm.com>
This had the wrong subject -- changed to ACTION-133.

On Feb 24, 2012, at 3:15 AM, Matthias Schunter wrote:

> Hi Folks,
> 
> I created a table in the W3C Wiki to start comparing both approaches:
> http://www.w3.org/wiki/DntResponseHeaderOrURI
> 
> Feel free to correct, augment, improve my initial (likely to be
> subjective) assessment.

I would have to remove your assessment entirely, since they don't make
any sense to me.  Adding one personal opinion on top of another, with
more personal opinions to be overlaid, doesn't work very well,
particularly with an imaginary ++/-- valuation.  And a couple of
those entries seem reversed.

Let's get the personal opinions out here first and then use the wiki
(or the draft) to document what we actually agree are facts.

Here is my summary based on the criteria in the table:

Criteria:
   Transmits tracking status
      Both solutions are equally expressive.  Both are dynamic when
      they need to be.  The resource can echo the client's DNT setting
      without impacting normal request caching.  The header cannot.

   Enables enforcement by regulators
      Both solutions enable enforcement.  The headers tell the user the
      tracking status of a request that was just made.  The resource tells
      the user the tracking status of all resources matching a specific
      URI prefix for a specific time period (no less than 24hrs).
      The resource status can be viewed, archived, and printed by any user
      using any browser, crawled by spiders, and indexed by search engines
      (custom or general), whereas the header field is only viewable by
      tools that normal users don't use and require a separate tool to save
      them for archival purposes.  The resource status could be further
      extended with fields for digital signatures, though I doubt that
      would be necessary.

   Granularity
      Whether it is per-request is not relevant.  Per-resource is.
      Both solutions can differentiate specific policies per specific
      resource, if that is how the origin server wants to implement
      their site.  The header field informs the user after the request
      has been made.  The status resource defines a scope of applicability
      that may result in two extra requests for an agent that is
      actively verifying tracking status.

   Simplicity of user agent
      Reading a header after the request has been made is usually easier
      than making a separate request.  OTOH, finding out the status
      after a request has been made is less useful than before.  A JSON
      response is less likely to be lost by intermediaries and easier
      to process by javascript and extensions that might not have access
      to the HTTP header fields.

   Traffic generated

      Response header: Roughly 8 bytes per response minimum on every
          response to every request made over HTTP.  Estimated traffic
          generated is some number of terabytes per day.  For example, if we
          take www.google.com alone at 1 billion searches per day, with each
          search invoking roughly 14 subrequests, we have a minimum of 120GB
          per day of extra traffic generated at that site alone, regardless
          of whether the user agents care to receive that information.

      Status resource: Roughly 1kb per site visited per day per actively
          verifying user agent, excluding those sites that the user
          agent has chosen to always-ban or always-accept.  Estimated
          traffic is some number of megabytes per day for all sites combined,
          depending on how many users choose to enable active verification
          and how many sites require a dynamic response (i.e., tracking).
          Note that verification is *not* necessary to satisfy DNT, so the
          traffic generated by the status resource for DNT enabled without
          active verification is zero.

   Robustness wrt caching

      Response header: if it doesn't echo the user's request, then it has
          no additional impact on caching -- tracking resources are typically
          marked as non-cacheable or at least must-revalidate.  Deployed
          intermediaries might fail to forward the response header, though
          I think that is unlikely (failing to forward a new request header
          field is more common, but that will be fixed over time).

      Status resource: it is a separate resource, so can be separately
          cached, delivered by separate servers, redirected to common
          locations, etc.  In short, it is equivalent to favicon.ico
          except that only a small number of user agents would make the
          request.

   Tracking protection on info resource [you reversed the +/- values here]

      Response header: The header proposal uses a well-known resource for
          supplemental information.

      Status resource: The status resource is the info resource.

> Comments / Questions for Well-known URIs:

> o Is there a way to prevent that each URL needs to always be checked at the well-known location? E.g., retrieving foo.com/bar/one requires checking foo.com/.well-known/dnt/bar/one. If I now want to retrieve foo.com or foo.com/bar/one/sub, I need to re-check. Don't I? Wouldnt this double the web traffic (sort-of?)

First of all, nobody *needs* to make any checks.  DNT is still enabled
without needing to check.  Verification of status is an optional feature
that does nothing to ensure compliance -- it merely provides a means to
obtain that status if an agent wants to know what the server claims and
to record that claim for posterity.

When verification is desired, the first request is to "/.well-known/dnt".
The scope of its applicability is defined by the path member in the response.
If (and only if) that resource does not apply to the target URI, then a second
request is made on "/.well-known/dnt/target/path".  It is extremely unlikely
that a third-party site is going to have more than one tracking policy per
site, but this mechanism allows for that case without adding overhead to
the common case of one policy per domain.

Other criteria:

   Deployability (how easy is it to add it to existing web sites)

      Response header: Assuming the site owner knows what a header field is
         and knows how to configure their server to send the header and
         has permission to do so by the site operator, then this can be
         configured via a SetHeader rule (if static) or a custom module
         for those folks on Apache.  For dynamic resources, some code
         modification might be required depending on how they are implemented.
         This proposal also requires a well-known address with the ability
         to process query fields.

      Status resource: For sites that don't track, add a single file with
         the content
               {"path":"/","tracking":false}
         and assign it the application/json type.  For sites that do track,
         a dynamic response can be achieved with any common template language,
         custom module, or CGI.  More importantly, since the status resource is
         an entirely separate implementation than the existing resources, there
         is no need to worry about breaking the existing site.  Sites that
         use many different domains with a single policy can redirect to one
         location.  Even complicated sites like Yahoo! could deploy this in a
         single day, since it would involve no risk to their working apps.

   Request latency

      Response header: Every response has to include a header field prior to the
         content being sent, which means it adds a small latency to every response.

      Status resource: If no active verification is needed, no latency is added.
         If verification is on but done asynchronously (not prior to making the
         actual request), then the only noticeable latency would be the general
         overhead and use of connections on the user agent.  If prior verification
         is enabled, then substantial latency is added to the first request of a
         site due to the additional one or two requests (if not already performed
         for that site).  However, prior verification isn't even possible with the
         header proposal.

   Third-party verification / Measuring deployment

      Response header: A third party can crawl a site to see if every one of its
         exposed resources is flagged as respecting DNT or not, assuming that
         the sites don't mind a crawler that doesn't respect robots.txt and
         adds false counts to its advertising counters.  Right.  That isn't going
         to happen, and there is nothing to prevent the sites from sending a
         different response to the crawler than it would to a user.

      Status resource: A third party can crawl every domain on the web and safely
         request the base well-known address, index the response, and make that
         available to user agents (or regulators) for evaluation of deployment
         or a curated list of claims-to-be-compliant sites.  Each response has
         a minimum TTL of 24 hours, longer if noted by expires or max-age.

   Transparency

      Response header: only indicates compliance or non-compliance, which means
         the entire working group must agree to a single definition of tracking
         that encompasses all necessary exceptions and somehow explain that to
         users.

      Status resource: indicates "tracking", "no tracking", or "tracking with
         limitations", which allows the user agent to choose whether it wants
         to distinguish tracking in general from tracking that is specifically
         limited to categories-to-be-defined-later with an agreement to adhere
         to data minimization.  We can use the common definition of tracking if
         each of the exceptions is defined as acceptable tracking with limitations.

   Individual control over data stored
      Response header: enabled via a separate well-known resource, though underspecified.
      Status resource: enabled via links provided in the status response.

   Applicable outside of HTTP
      Response header: any protocol with response header fields.
      Status resource: any protocol that has URIs with a path and the ability to get.


Cheers,

Roy T. Fielding                     <http://roy.gbiv.com/>
Principal Scientist, Adobe Systems  <http://adobe.com/enterprise>
Received on Saturday, 25 February 2012 08:23:28 UTC

This archive was generated by hypermail 2.3.1 : Friday, 3 November 2017 21:44:45 UTC