- From: Roy T. Fielding <fielding@gbiv.com>
- Date: Sat, 25 Feb 2012 00:23:04 -0800
- To: Matthias Schunter <mts@zurich.ibm.com>
- Cc: public-tracking@w3.org
This had the wrong subject -- changed to ACTION-133. On Feb 24, 2012, at 3:15 AM, Matthias Schunter wrote: > Hi Folks, > > I created a table in the W3C Wiki to start comparing both approaches: > http://www.w3.org/wiki/DntResponseHeaderOrURI > > Feel free to correct, augment, improve my initial (likely to be > subjective) assessment. I would have to remove your assessment entirely, since they don't make any sense to me. Adding one personal opinion on top of another, with more personal opinions to be overlaid, doesn't work very well, particularly with an imaginary ++/-- valuation. And a couple of those entries seem reversed. Let's get the personal opinions out here first and then use the wiki (or the draft) to document what we actually agree are facts. Here is my summary based on the criteria in the table: Criteria: Transmits tracking status Both solutions are equally expressive. Both are dynamic when they need to be. The resource can echo the client's DNT setting without impacting normal request caching. The header cannot. Enables enforcement by regulators Both solutions enable enforcement. The headers tell the user the tracking status of a request that was just made. The resource tells the user the tracking status of all resources matching a specific URI prefix for a specific time period (no less than 24hrs). The resource status can be viewed, archived, and printed by any user using any browser, crawled by spiders, and indexed by search engines (custom or general), whereas the header field is only viewable by tools that normal users don't use and require a separate tool to save them for archival purposes. The resource status could be further extended with fields for digital signatures, though I doubt that would be necessary. Granularity Whether it is per-request is not relevant. Per-resource is. Both solutions can differentiate specific policies per specific resource, if that is how the origin server wants to implement their site. The header field informs the user after the request has been made. The status resource defines a scope of applicability that may result in two extra requests for an agent that is actively verifying tracking status. Simplicity of user agent Reading a header after the request has been made is usually easier than making a separate request. OTOH, finding out the status after a request has been made is less useful than before. A JSON response is less likely to be lost by intermediaries and easier to process by javascript and extensions that might not have access to the HTTP header fields. Traffic generated Response header: Roughly 8 bytes per response minimum on every response to every request made over HTTP. Estimated traffic generated is some number of terabytes per day. For example, if we take www.google.com alone at 1 billion searches per day, with each search invoking roughly 14 subrequests, we have a minimum of 120GB per day of extra traffic generated at that site alone, regardless of whether the user agents care to receive that information. Status resource: Roughly 1kb per site visited per day per actively verifying user agent, excluding those sites that the user agent has chosen to always-ban or always-accept. Estimated traffic is some number of megabytes per day for all sites combined, depending on how many users choose to enable active verification and how many sites require a dynamic response (i.e., tracking). Note that verification is *not* necessary to satisfy DNT, so the traffic generated by the status resource for DNT enabled without active verification is zero. Robustness wrt caching Response header: if it doesn't echo the user's request, then it has no additional impact on caching -- tracking resources are typically marked as non-cacheable or at least must-revalidate. Deployed intermediaries might fail to forward the response header, though I think that is unlikely (failing to forward a new request header field is more common, but that will be fixed over time). Status resource: it is a separate resource, so can be separately cached, delivered by separate servers, redirected to common locations, etc. In short, it is equivalent to favicon.ico except that only a small number of user agents would make the request. Tracking protection on info resource [you reversed the +/- values here] Response header: The header proposal uses a well-known resource for supplemental information. Status resource: The status resource is the info resource. > Comments / Questions for Well-known URIs: > o Is there a way to prevent that each URL needs to always be checked at the well-known location? E.g., retrieving foo.com/bar/one requires checking foo.com/.well-known/dnt/bar/one. If I now want to retrieve foo.com or foo.com/bar/one/sub, I need to re-check. Don't I? Wouldnt this double the web traffic (sort-of?) First of all, nobody *needs* to make any checks. DNT is still enabled without needing to check. Verification of status is an optional feature that does nothing to ensure compliance -- it merely provides a means to obtain that status if an agent wants to know what the server claims and to record that claim for posterity. When verification is desired, the first request is to "/.well-known/dnt". The scope of its applicability is defined by the path member in the response. If (and only if) that resource does not apply to the target URI, then a second request is made on "/.well-known/dnt/target/path". It is extremely unlikely that a third-party site is going to have more than one tracking policy per site, but this mechanism allows for that case without adding overhead to the common case of one policy per domain. Other criteria: Deployability (how easy is it to add it to existing web sites) Response header: Assuming the site owner knows what a header field is and knows how to configure their server to send the header and has permission to do so by the site operator, then this can be configured via a SetHeader rule (if static) or a custom module for those folks on Apache. For dynamic resources, some code modification might be required depending on how they are implemented. This proposal also requires a well-known address with the ability to process query fields. Status resource: For sites that don't track, add a single file with the content {"path":"/","tracking":false} and assign it the application/json type. For sites that do track, a dynamic response can be achieved with any common template language, custom module, or CGI. More importantly, since the status resource is an entirely separate implementation than the existing resources, there is no need to worry about breaking the existing site. Sites that use many different domains with a single policy can redirect to one location. Even complicated sites like Yahoo! could deploy this in a single day, since it would involve no risk to their working apps. Request latency Response header: Every response has to include a header field prior to the content being sent, which means it adds a small latency to every response. Status resource: If no active verification is needed, no latency is added. If verification is on but done asynchronously (not prior to making the actual request), then the only noticeable latency would be the general overhead and use of connections on the user agent. If prior verification is enabled, then substantial latency is added to the first request of a site due to the additional one or two requests (if not already performed for that site). However, prior verification isn't even possible with the header proposal. Third-party verification / Measuring deployment Response header: A third party can crawl a site to see if every one of its exposed resources is flagged as respecting DNT or not, assuming that the sites don't mind a crawler that doesn't respect robots.txt and adds false counts to its advertising counters. Right. That isn't going to happen, and there is nothing to prevent the sites from sending a different response to the crawler than it would to a user. Status resource: A third party can crawl every domain on the web and safely request the base well-known address, index the response, and make that available to user agents (or regulators) for evaluation of deployment or a curated list of claims-to-be-compliant sites. Each response has a minimum TTL of 24 hours, longer if noted by expires or max-age. Transparency Response header: only indicates compliance or non-compliance, which means the entire working group must agree to a single definition of tracking that encompasses all necessary exceptions and somehow explain that to users. Status resource: indicates "tracking", "no tracking", or "tracking with limitations", which allows the user agent to choose whether it wants to distinguish tracking in general from tracking that is specifically limited to categories-to-be-defined-later with an agreement to adhere to data minimization. We can use the common definition of tracking if each of the exceptions is defined as acceptable tracking with limitations. Individual control over data stored Response header: enabled via a separate well-known resource, though underspecified. Status resource: enabled via links provided in the status response. Applicable outside of HTTP Response header: any protocol with response header fields. Status resource: any protocol that has URIs with a path and the ability to get. Cheers, Roy T. Fielding <http://roy.gbiv.com/> Principal Scientist, Adobe Systems <http://adobe.com/enterprise>
Received on Saturday, 25 February 2012 08:23:28 UTC