Re: Initial feedback on the well-known URI Proposal from Roy T. Fielding on 2012-03-06 (public-tracking@w3.org from March 2012)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Tue, 6 Mar 2012 14:34:43 -0800
To: Matthias Schunter <mts@zurich.ibm.com>
Cc: Tracking Protection Working Group WG <public-tracking@w3.org>
Message-Id: <E46057BD-8A9C-49F4-8A96-9D9449B2D926@gbiv.com>

On Mar 6, 2012, at 1:14 AM, Matthias Schunter wrote:

> Headers may be easier to add to a site with federated management (many
> admins maintaining some URLs; not having control over the well-known URI).
> 
> In my mind, URIs are more coarse grained - best for making statements
> over a large portion of a  site while headers make small statements
> about a single URL retrieved.
> 
> As a consequence, a site where each URL may have a different response
> should live easier with headers; for retrieving the same info from a
> well-known URI, the whole site needs to be 'mirrored' under the
> well-known URI and the number of requests would double (Roy: Correct
> me if I am wrong!).

Actually, you are wrong, though for reasons that very few people would
anticipate.  First, the 'mirror' is not of the site but of the resource
namespace, and it ends at the first ancestor that has the same tracking
policy as all of its descendants.  Descendants would redirect up.
Second, there are no sites where every URL has its own tracking policy.
Finally, in the worst case, the site can simply pick the union of all
tracking behavior for the site and present that at the single
/.well-known/dnt --- we do not penalize sites for saying they track more
than they actually do for a given URI.

Tracking is not something that a singular resource owner on a larger
site infrastructure can control.  Typically, tracking is done on a
per site basis and imposed on all subsites.  For example, access logs
are collected and mined separately from the resource owners.  They might
even be extracted at load balancers instead of at the HTTP server.
Therefore, the URL owner would need to know exactly what all other
aspects of the resource handling infrastructure do with the request
data before they can claim whether tracking is active or not.

IBM.com is a good example of that.  An individual resource owner might
only be aware of their own project information, but the IBM branding group
applies a template to every page delivered which includes the header
about IBM, a set of javascript to do tracking, and a footer to explain
the IBM policies.  How much of "http://www.zurich.ibm.com/" is even
controlled by the Swiss organization?

In the CMS industry, it is very rare for authors to know what marketing
is doing on their own site unless they look at the HTML source generated
to the outside world.

....Roy

Received on Tuesday, 6 March 2012 22:35:03 UTC