RE: Initial feedback on the well-known URI Proposal from Kevin Smith on 2012-02-29 (public-tracking@w3.org from February 2012)

From: Kevin Smith <kevsmith@adobe.com>
Date: Wed, 29 Feb 2012 13:14:09 -0800
To: "Roy T. Fielding" <fielding@gbiv.com>, Matthias Schunter <mts@zurich.ibm.com>
CC: "public-tracking@w3.org" <public-tracking@w3.org>
Message-ID: <6E120BECD1FFF142BC26B61F4D994CF3064CC1F669@nambx07.corp.adobe.com>
>From reading Roy's description, it sounds to me like there is at least one piece of functionality available when using a URI vs a header - that you can request the policy before actually hitting the page.  This does not seem like a huge advantage to me, but it's nice to know the options.

The question I have is, what can a header do that a URI cannot?  If, other than the above mentioned minor discrepancy, they are functionally equivalent, which I suspect is true, then this simply becomes a question of cost analysis.  If the benefits are equivalent, pick the method that is easier to implement, and cheaper to maintain and use.

-----Original Message-----
From: Roy T. Fielding [mailto:fielding@gbiv.com] 
Sent: Wednesday, February 29, 2012 12:54 PM
To: Matthias Schunter
Cc: public-tracking@w3.org
Subject: Re: Initial feedback on the well-known URI Proposal

On Feb 29, 2012, at 2:11 AM, Matthias Schunter wrote:

> I now had a closer look at your proposal to transmit tracking status 
> via well-known URI.
> 
> I believe that both proposals, headers and URIs have benefits. I need 
> to continue trying to understand their pros and cons.

My goal was to capture all of the WG's use cases, including those that would be prohibitively expensive to include in a header.  I am hoping that reviewers will consider all of the possible things they need from a response, for whatever reasons they might need them, and make sure that the tracking status resource satisfies those cases.  If not, it is relatively easy to add those cases when working with a separate resource.

> Here is some initial feedback on the proposal:
> 
> 1. I like the URI proposal and I believe it has its merit. We need to 
> understand
>    whether URI/header or both are the avenue to go forward

I will be sad if we can't agree to have just the resource, unless we have a use case that cannot be satisfied by the separate resource space.

> 2. A main goal of DNT (my perspective) is simplicity and ease of 
> use/understanding. I believe that the overall scheme should be 
> minimalistic to keep it as simple as possible. We spent time in 
> Brussels slimming the headers to the minimal info that is essential.
> I'd like to do a similar exercise for your proposal.

My proposal has many more details because it satisfies several more use cases than the header proposal.  For example, the echo DNT use case, the ability to distinguish specific exceptions, providing a list of domains to be considered first-party, extensibility, etc.
It is a complete proposal and, IMO, vastly superior to sending a header field on every response because it satisfies all of the use cases without impacting existing implementations or caching.

It has benefited from all of the prior discussion we have had on header fields.  It is just a different way to address the same problem (a more Web-centric, RESTful way, if I may add, though I bet somebody will eventually complain that application/json isn't a hypertext type).

> This means that I would omit all fields that are not essential to make 
> the proposal slim and similar to the headers.
> - Fields I would remove are
>  same-site, edits, partners, received (we agreed that it is not 
> needed; it no longer exists in the headers either)

That would eliminate the use cases for identifying the scope of first-party, providing individual control over the data that has been collected, providing fair warning (before the real resource request) of what third-party trackers are used by the site, and echoing the DNT field back to the client to detect evil intermediaries.  The only reason we don't have those cases handled by the header field is because it would be prohibitively expensive to do so in headers, either because of the size or because of the effect on the cacheability of normal responses.  Hence, your suggested deletions would remove most of the reasons why the resource fulfills the needs of the privacy and regulator folks better than the header field proposal.

I am not wedded to the member names -- same-site just seemed more natural than first-party-scope.  I am not sure if we need the use case for partners (identifying third-parties before one goes to the site), since that may be too hard to manage, but it should at least be considered by the WG.

> - I am not sure about the options as a separate field since the policy 
> may link to it, too.

Specific links to enable individual control is a requirement of the regulators.  They should not be buried in a policy doc.

> - I also would focus on fields that are usually static (e.g.,  not 
> having a 'received' field)

Why?  The main reason I wasn't able to convince folks at the start of the header field discussion that a well-known resource would satisfy their concerns is the preconception that such a resource is always just a file -- that it couldn't be dynamic enough.  This proposal demonstrates how dynamic it can be.

> 3. I would fold 'tracking' and 'response' into a single field that has 
> the same values as the headers (no-tracking, first-party, 
> service-provider, tracking)

I have no interest in that change, for efficiency reasons.  Most sites do no tracking of any kind, and having that declared by a boolean up front allows for the use case of sites that don't want to be associated with the tracking-but-limited-to-exemptions sites.

> 4. A new comment: While I understand the idea of the path field 
> (scoping of status objects), I do not understand its semantics enough.
> E.g., I would not know what status object to apply if there are two 
> objects
>  Well-known URI			Path in Object
>  /sub						/
>  /							/sub

The spec describes a specific algorithm for deciding it in 5.1.2:

    A user agent may check the tracking status for a given resource URI by
    making a retrieval request for the well-known address
      /.well-known/dnt
    relative to that URI.
    ...

    Once the tracking status representation is obtained, parse the
    representation as JSON to extract the Javascript status-object.
    If parsing results in a syntax error, the user agent should
    consider the site to be non-conformant with this protocol.

    If the status-object does not have a member named path or if the value
    of path is not "/" and not a prefix of the path component for the URI
    being checked, then find the service-specific tracking status resource
    by taking the template
       /.well-known/dnt{+pathinfo}
    and replacing {+pathinfo} with the path component of the URI being
    checked. Perform a retrieval request on the service-specific tracking
    status resource and process the result as described above to obtain
    the specific tracking status.

Note that the second status-object retrieved is not examined to see if its path component is consistent -- it applies regardless.

> Some more questions:
> 1. Can there be multiple status-objects at one well-known URI?

No, that is not allowed by the ABNF.

> 2. We should attempt at finding a way to minimize the number of 
> requests to the well-known URI.

We already have.  In almost all real cases, there will be exactly one per site per 24 hours (or longer if the site has declared a TTL for this response), and only then for user agents actively verifying the tracking status.  In all other cases, it is two requests, one for the base "/.well-known/dnt" and a second for a specific path.
If a site wants to minimize secondary requests, it can do so by providing no more than one common path on their site per applicable policy, which is how URI delegation works naturally.


Cheers,

Roy T. Fielding                     <http://roy.gbiv.com/>
Principal Scientist, Adobe Systems  <http://adobe.com/enterprise>
Received on Wednesday, 29 February 2012 21:14:56 UTC