- From: Roy T. Fielding <fielding@gbiv.com>
- Date: Wed, 29 Feb 2012 11:54:00 -0800
- To: Matthias Schunter <mts@zurich.ibm.com>
- Cc: "public-tracking@w3.org" <public-tracking@w3.org>
On Feb 29, 2012, at 2:11 AM, Matthias Schunter wrote: > I now had a closer look at your proposal to transmit tracking status > via well-known URI. > > I believe that both proposals, headers and URIs have benefits. I need > to continue trying to understand their pros and cons. My goal was to capture all of the WG's use cases, including those that would be prohibitively expensive to include in a header. I am hoping that reviewers will consider all of the possible things they need from a response, for whatever reasons they might need them, and make sure that the tracking status resource satisfies those cases. If not, it is relatively easy to add those cases when working with a separate resource. > Here is some initial feedback on the proposal: > > 1. I like the URI proposal and I believe it has its merit. We need to > understand > whether URI/header or both are the avenue to go forward I will be sad if we can't agree to have just the resource, unless we have a use case that cannot be satisfied by the separate resource space. > 2. A main goal of DNT (my perspective) is simplicity and ease of > use/understanding. I believe that the overall scheme should be > minimalistic to keep it as simple as possible. We spent time in > Brussels slimming the headers to the minimal info that is essential. > I'd like to do a similar exercise for your proposal. My proposal has many more details because it satisfies several more use cases than the header proposal. For example, the echo DNT use case, the ability to distinguish specific exceptions, providing a list of domains to be considered first-party, extensibility, etc. It is a complete proposal and, IMO, vastly superior to sending a header field on every response because it satisfies all of the use cases without impacting existing implementations or caching. It has benefited from all of the prior discussion we have had on header fields. It is just a different way to address the same problem (a more Web-centric, RESTful way, if I may add, though I bet somebody will eventually complain that application/json isn't a hypertext type). > This means that I would omit all fields that are not essential to make > the proposal slim and similar to the headers. > - Fields I would remove are > same-site, edits, partners, received (we agreed that it is not > needed; it no longer exists in the headers either) That would eliminate the use cases for identifying the scope of first-party, providing individual control over the data that has been collected, providing fair warning (before the real resource request) of what third-party trackers are used by the site, and echoing the DNT field back to the client to detect evil intermediaries. The only reason we don't have those cases handled by the header field is because it would be prohibitively expensive to do so in headers, either because of the size or because of the effect on the cacheability of normal responses. Hence, your suggested deletions would remove most of the reasons why the resource fulfills the needs of the privacy and regulator folks better than the header field proposal. I am not wedded to the member names -- same-site just seemed more natural than first-party-scope. I am not sure if we need the use case for partners (identifying third-parties before one goes to the site), since that may be too hard to manage, but it should at least be considered by the WG. > - I am not sure about the options as a separate field since the policy > may link to it, too. Specific links to enable individual control is a requirement of the regulators. They should not be buried in a policy doc. > - I also would focus on fields that are usually static (e.g., > not having a 'received' field) Why? The main reason I wasn't able to convince folks at the start of the header field discussion that a well-known resource would satisfy their concerns is the preconception that such a resource is always just a file -- that it couldn't be dynamic enough. This proposal demonstrates how dynamic it can be. > 3. I would fold 'tracking' and 'response' into a single field that has > the same values as the headers (no-tracking, first-party, > service-provider, tracking) I have no interest in that change, for efficiency reasons. Most sites do no tracking of any kind, and having that declared by a boolean up front allows for the use case of sites that don't want to be associated with the tracking-but-limited-to-exemptions sites. > 4. A new comment: While I understand the idea of the path field > (scoping of status objects), I do not understand its semantics enough. > E.g., I would not know what status object to apply if there are two > objects > Well-known URI Path in Object > /sub / > / /sub The spec describes a specific algorithm for deciding it in 5.1.2: A user agent may check the tracking status for a given resource URI by making a retrieval request for the well-known address /.well-known/dnt relative to that URI. ... Once the tracking status representation is obtained, parse the representation as JSON to extract the Javascript status-object. If parsing results in a syntax error, the user agent should consider the site to be non-conformant with this protocol. If the status-object does not have a member named path or if the value of path is not "/" and not a prefix of the path component for the URI being checked, then find the service-specific tracking status resource by taking the template /.well-known/dnt{+pathinfo} and replacing {+pathinfo} with the path component of the URI being checked. Perform a retrieval request on the service-specific tracking status resource and process the result as described above to obtain the specific tracking status. Note that the second status-object retrieved is not examined to see if its path component is consistent -- it applies regardless. > Some more questions: > 1. Can there be multiple status-objects at one well-known URI? No, that is not allowed by the ABNF. > 2. We should attempt at finding a way to minimize the number of > requests to the well-known URI. We already have. In almost all real cases, there will be exactly one per site per 24 hours (or longer if the site has declared a TTL for this response), and only then for user agents actively verifying the tracking status. In all other cases, it is two requests, one for the base "/.well-known/dnt" and a second for a specific path. If a site wants to minimize secondary requests, it can do so by providing no more than one common path on their site per applicable policy, which is how URI delegation works naturally. Cheers, Roy T. Fielding <http://roy.gbiv.com/> Principal Scientist, Adobe Systems <http://adobe.com/enterprise>
Received on Wednesday, 29 February 2012 19:54:25 UTC