Re: Initial feedback on the well-known URI Proposal from Roy T. Fielding on 2012-02-29 (public-tracking@w3.org from February 2012)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Wed, 29 Feb 2012 11:54:00 -0800
To: Matthias Schunter <mts@zurich.ibm.com>
Cc: "public-tracking@w3.org" <public-tracking@w3.org>
Message-Id: <DE0D42D8-2285-4586-8A27-8541EB7074A1@gbiv.com>
On Feb 29, 2012, at 2:11 AM, Matthias Schunter wrote:

> I now had a closer look at your proposal to transmit tracking status
> via well-known URI.
> 
> I believe that both proposals, headers and URIs have benefits. I need
> to continue trying to understand their pros and cons.

My goal was to capture all of the WG's use cases, including those that
would be prohibitively expensive to include in a header.  I am hoping
that reviewers will consider all of the possible things they need from
a response, for whatever reasons they might need them, and make sure
that the tracking status resource satisfies those cases.  If not, it
is relatively easy to add those cases when working with a separate
resource.

> Here is some initial feedback on the proposal:
> 
> 1. I like the URI proposal and I believe it has its merit. We need to
> understand
>    whether URI/header or both are the avenue to go forward

I will be sad if we can't agree to have just the resource, unless we
have a use case that cannot be satisfied by the separate resource space.

> 2. A main goal of DNT (my perspective) is simplicity and ease of
> use/understanding. I believe that the overall scheme should be
> minimalistic to keep it as simple as possible. We spent time in
> Brussels slimming the headers to the minimal info that is essential.
> I'd like to do a similar exercise for your proposal.

My proposal has many more details because it satisfies several more
use cases than the header proposal.  For example, the echo DNT use case,
the ability to distinguish specific exceptions, providing a list of
domains to be considered first-party, extensibility, etc.
It is a complete proposal and, IMO, vastly superior to sending a header
field on every response because it satisfies all of the use cases
without impacting existing implementations or caching.

It has benefited from all of the prior discussion we have had on
header fields.  It is just a different way to address the same problem
(a more Web-centric, RESTful way, if I may add, though I bet somebody
will eventually complain that application/json isn't a hypertext type).

> This means that I would omit all fields that are not essential to make
> the proposal slim and similar to the headers.
> - Fields I would remove are
>  same-site, edits, partners, received (we agreed that it is not
> needed; it no longer exists in the headers either)

That would eliminate the use cases for identifying the scope of first-party,
providing individual control over the data that has been collected,
providing fair warning (before the real resource request) of what
third-party trackers are used by the site, and echoing the DNT field
back to the client to detect evil intermediaries.  The only reason
we don't have those cases handled by the header field is because it
would be prohibitively expensive to do so in headers, either because
of the size or because of the effect on the cacheability of normal
responses.  Hence, your suggested deletions would remove most of the
reasons why the resource fulfills the needs of the privacy and
regulator folks better than the header field proposal.

I am not wedded to the member names -- same-site just seemed more
natural than first-party-scope.  I am not sure if we need the use case
for partners (identifying third-parties before one goes to the site),
since that may be too hard to manage, but it should at least be
considered by the WG.

> - I am not sure about the options as a separate field since the policy
> may link to it, too.

Specific links to enable individual control is a requirement
of the regulators.  They should not be buried in a policy doc.

> - I also would focus on fields that are usually static (e.g.,
>  not having a 'received' field)

Why?  The main reason I wasn't able to convince folks at the start
of the header field discussion that a well-known resource would satisfy
their concerns is the preconception that such a resource is always
just a file -- that it couldn't be dynamic enough.  This proposal
demonstrates how dynamic it can be.

> 3. I would fold 'tracking' and 'response' into a single field that has
> the same values as the headers (no-tracking, first-party,
> service-provider, tracking)

I have no interest in that change, for efficiency reasons.  Most sites
do no tracking of any kind, and having that declared by a boolean up
front allows for the use case of sites that don't want to be associated
with the tracking-but-limited-to-exemptions sites.

> 4. A new comment: While I understand the idea of the path field
> (scoping of status objects), I do not understand its semantics enough.
> E.g., I would not know what status object to apply if there are two
> objects
>  Well-known URI			Path in Object
>  /sub						/
>  /							/sub

The spec describes a specific algorithm for deciding it in 5.1.2:

    A user agent may check the tracking status for a given resource URI by
    making a retrieval request for the well-known address
      /.well-known/dnt
    relative to that URI.
    ...

    Once the tracking status representation is obtained, parse the
    representation as JSON to extract the Javascript status-object.
    If parsing results in a syntax error, the user agent should
    consider the site to be non-conformant with this protocol.

    If the status-object does not have a member named path or if the value
    of path is not "/" and not a prefix of the path component for the URI
    being checked, then find the service-specific tracking status resource
    by taking the template
       /.well-known/dnt{+pathinfo}
    and replacing {+pathinfo} with the path component of the URI being
    checked. Perform a retrieval request on the service-specific tracking
    status resource and process the result as described above to obtain
    the specific tracking status.

Note that the second status-object retrieved is not examined to see if
its path component is consistent -- it applies regardless.

> Some more questions:
> 1. Can there be multiple status-objects at one well-known URI?

No, that is not allowed by the ABNF.

> 2. We should attempt at finding a way to minimize the number of
> requests to the well-known URI.

We already have.  In almost all real cases, there will be exactly
one per site per 24 hours (or longer if the site has declared a TTL
for this response), and only then for user agents actively verifying
the tracking status.  In all other cases, it is two requests,
one for the base "/.well-known/dnt" and a second for a specific path.
If a site wants to minimize secondary requests, it can do so by
providing no more than one common path on their site per applicable
policy, which is how URI delegation works naturally.


Cheers,

Roy T. Fielding                     <http://roy.gbiv.com/>
Principal Scientist, Adobe Systems  <http://adobe.com/enterprise>
Received on Wednesday, 29 February 2012 19:54:25 UTC