RE: Initial feedback on the well-known URI Proposal from Amy Colando (LCA) on 2012-03-06 (public-tracking@w3.org from March 2012)

From: Amy Colando (LCA) <acolando@microsoft.com>
Date: Tue, 6 Mar 2012 00:05:46 +0000
To: Matthias Schunter <mts@zurich.ibm.com>, John Simpson <john@consumerwatchdog.org>
CC: "public-tracking@w3.org" <public-tracking@w3.org>
Message-ID: <81152EDFE766CB4692EA39AECD2AA5B6130C288D@TK5EX14MBXC295.redmond.corp.microsoft.>
Can I put in a plug for spending some significant time (1-2 hours) on this during F2F?  I'd really appreciate a walk-through from a user's POV of each option, as well as the technical pros and cons. Thanks.

-----Original Message-----
From: Matthias Schunter [mailto:mts@zurich.ibm.com] 
Sent: Monday, March 05, 2012 5:42 AM
To: John Simpson
Cc: public-tracking@w3.org
Subject: Re: Initial feedback on the well-known URI Proposal

Hi John,

thanks for the suggestion.

For simplicity reasons, I prefer to choose either URI or else headers.
Otherwise, we have to define semantics for their interplay (does a header override the info at the well-known URI or vice versa).

However, if we see that both have complementary benefits then we might allow both.

Regards,
matthias


On 3/1/2012 9:53 PM, John Simpson wrote:
> If there are differing advantages to the response header vs. the 
> well-known URI depending on the size and simplicity of the website, 
> would that imply that the spec should offer the option of responding 
> with either and leave the choice to the implementer?  I'm not 
> advocating, I'm asking...
> 
> John
> 
> On Mar 1, 2012, at 12:22 AM, Matthias Schunter wrote:
> 
>> Hi!
>>
>> I started a comparison here:
>> http://www.w3.org/wiki/DntResponseHeaderOrURI
>> Feel free to edit (I assume everyone can).
>>
>> I currently I see  three advantages for headers:
>> 1. They are much simpler than the current proposal by Roy 2. Their 
>> scoping is easier: While URI resource needs to explain its scope,
>>     the header is just attached
>> 3. Manageability in a large enterprise (say ibm.com
>> <http://ibm.com>) may be easier:
>>    The 'owners' of resources can just attach headers while
>>    for maintaining well-known URIs in a single place for all
>>    resources requires synchronisation
>>
>> An advantage for URIs is that for simple sites, they are easier; just 
>> put a minimal tracking-status at the well-known URI and you are done.
>>
>> Feedback welcome (in particular for enhancing the wiki page).
>>
>>
>> Regards,
>> matthias
>>
>>
>>
>> On 2/29/2012 10:14 PM, Kevin Smith wrote:
>>>> From reading Roy's description, it sounds to me like there is at 
>>>> least one piece of functionality available when using a URI vs a 
>>>> header - that you can request the policy before actually hitting 
>>>> the page.  This does not seem like a huge advantage to me, but it's 
>>>> nice to know the options.
>>>
>>> The question I have is, what can a header do that a URI cannot?
>>>  If, other than the above mentioned minor discrepancy, they are 
>>> functionally equivalent, which I suspect is true, then this simply 
>>> becomes a question of cost analysis.  If the benefits are 
>>> equivalent, pick the method that is easier to implement, and cheaper 
>>> to maintain and use.
>>>
>>> -----Original Message-----
>>> From: Roy T. Fielding [mailto:fielding@gbiv.com]
>>> Sent: Wednesday, February 29, 2012 12:54 PM
>>> To: Matthias Schunter
>>> Cc: public-tracking@w3.org <mailto:public-tracking@w3.org>
>>> Subject: Re: Initial feedback on the well-known URI Proposal
>>>
>>> On Feb 29, 2012, at 2:11 AM, Matthias Schunter wrote:
>>>
>>>> I now had a closer look at your proposal to transmit tracking 
>>>> status via well-known URI.
>>>>
>>>> I believe that both proposals, headers and URIs have benefits. I 
>>>> need to continue trying to understand their pros and cons.
>>>
>>> My goal was to capture all of the WG's use cases, including those 
>>> that would be prohibitively expensive to include in a header.  I am 
>>> hoping that reviewers will consider all of the possible things they 
>>> need from a response, for whatever reasons they might need them, and 
>>> make sure that the tracking status resource satisfies those cases.  
>>> If not, it is relatively easy to add those cases when working with a 
>>> separate resource.
>>>
>>>> Here is some initial feedback on the proposal:
>>>>
>>>> 1. I like the URI proposal and I believe it has its merit. We need 
>>>> to understand
>>>>   whether URI/header or both are the avenue to go forward
>>>
>>> I will be sad if we can't agree to have just the resource, unless we 
>>> have a use case that cannot be satisfied by the separate resource 
>>> space.
>>>
>>>> 2. A main goal of DNT (my perspective) is simplicity and ease of 
>>>> use/understanding. I believe that the overall scheme should be 
>>>> minimalistic to keep it as simple as possible. We spent time in 
>>>> Brussels slimming the headers to the minimal info that is essential.
>>>> I'd like to do a similar exercise for your proposal.
>>>
>>> My proposal has many more details because it satisfies several more 
>>> use cases than the header proposal.  For example, the echo DNT use 
>>> case, the ability to distinguish specific exceptions, providing a 
>>> list of domains to be considered first-party, extensibility, etc.
>>> It is a complete proposal and, IMO, vastly superior to sending a 
>>> header field on every response because it satisfies all of the use 
>>> cases without impacting existing implementations or caching.
>>>
>>> It has benefited from all of the prior discussion we have had on 
>>> header fields.  It is just a different way to address the same 
>>> problem (a more Web-centric, RESTful way, if I may add, though I bet 
>>> somebody will eventually complain that application/json isn't a 
>>> hypertext type).
>>>
>>>> This means that I would omit all fields that are not essential to 
>>>> make the proposal slim and similar to the headers.
>>>> - Fields I would remove are
>>>> same-site, edits, partners, received (we agreed that it is not 
>>>> needed; it no longer exists in the headers either)
>>>
>>> That would eliminate the use cases for identifying the scope of 
>>> first-party, providing individual control over the data that has 
>>> been collected, providing fair warning (before the real resource
>>> request) of what third-party trackers are used by the site, and 
>>> echoing the DNT field back to the client to detect evil 
>>> intermediaries.  The only reason we don't have those cases handled 
>>> by the header field is because it would be prohibitively expensive 
>>> to do so in headers, either because of the size or because of the 
>>> effect on the cacheability of normal responses.  Hence, your 
>>> suggested deletions would remove most of the reasons why the 
>>> resource fulfills the needs of the privacy and regulator folks 
>>> better than the header field proposal.
>>>
>>> I am not wedded to the member names -- same-site just seemed more 
>>> natural than first-party-scope.  I am not sure if we need the use 
>>> case for partners (identifying third-parties before one goes to the 
>>> site), since that may be too hard to manage, but it should at least 
>>> be considered by the WG.
>>>
>>>> - I am not sure about the options as a separate field since the 
>>>> policy may link to it, too.
>>>
>>> Specific links to enable individual control is a requirement of the 
>>> regulators.  They should not be buried in a policy doc.
>>>
>>>> - I also would focus on fields that are usually static (e.g.,  not 
>>>> having a 'received' field)
>>>
>>> Why?  The main reason I wasn't able to convince folks at the start 
>>> of the header field discussion that a well-known resource would 
>>> satisfy their concerns is the preconception that such a resource is 
>>> always just a file -- that it couldn't be dynamic enough.  This 
>>> proposal demonstrates how dynamic it can be.
>>>
>>>> 3. I would fold 'tracking' and 'response' into a single field that 
>>>> has the same values as the headers (no-tracking, first-party, 
>>>> service-provider, tracking)
>>>
>>> I have no interest in that change, for efficiency reasons.  Most 
>>> sites do no tracking of any kind, and having that declared by a 
>>> boolean up front allows for the use case of sites that don't want to 
>>> be associated with the tracking-but-limited-to-exemptions sites.
>>>
>>>> 4. A new comment: While I understand the idea of the path field 
>>>> (scoping of status objects), I do not understand its semantics enough.
>>>> E.g., I would not know what status object to apply if there are two 
>>>> objects Well-known URIPath in Object /sub/ //sub
>>>
>>> The spec describes a specific algorithm for deciding it in 5.1.2:
>>>
>>>    A user agent may check the tracking status for a given resource 
>>> URI by
>>>    making a retrieval request for the well-known address
>>>      /.well-known/dnt
>>>    relative to that URI.
>>>    ...
>>>
>>>    Once the tracking status representation is obtained, parse the
>>>    representation as JSON to extract the Javascript status-object.
>>>    If parsing results in a syntax error, the user agent should
>>>    consider the site to be non-conformant with this protocol.
>>>
>>>    If the status-object does not have a member named path or if the 
>>> value
>>>    of path is not "/" and not a prefix of the path component for the 
>>> URI
>>>    being checked, then find the service-specific tracking status 
>>> resource
>>>    by taking the template
>>>       /.well-known/dnt{+pathinfo}
>>>    and replacing {+pathinfo} with the path component of the URI being
>>>    checked. Perform a retrieval request on the service-specific 
>>> tracking
>>>    status resource and process the result as described above to obtain
>>>    the specific tracking status.
>>>
>>> Note that the second status-object retrieved is not examined to see 
>>> if its path component is consistent -- it applies regardless.
>>>
>>>> Some more questions:
>>>> 1. Can there be multiple status-objects at one well-known URI?
>>>
>>> No, that is not allowed by the ABNF.
>>>
>>>> 2. We should attempt at finding a way to minimize the number of 
>>>> requests to the well-known URI.
>>>
>>> We already have.  In almost all real cases, there will be exactly 
>>> one per site per 24 hours (or longer if the site has declared a TTL 
>>> for this response), and only then for user agents actively verifying 
>>> the tracking status.  In all other cases, it is two requests, one 
>>> for the base "/.well-known/dnt" and a second for a specific path.
>>> If a site wants to minimize secondary requests, it can do so by 
>>> providing no more than one common path on their site per applicable 
>>> policy, which is how URI delegation works naturally.
>>>
>>>
>>> Cheers,
>>>
>>> Roy T. Fielding                     <http://roy.gbiv.com/>
>>> Principal Scientist, Adobe Systems  <http://adobe.com/enterprise>
>>>
>>>
>>>
>>>
>>>
>>>
>>
> 
> ----------
> John M. Simpson
> Consumer Advocate
> Consumer Watchdog
> 1750 Ocean Park Blvd. ,Suite 200
> Santa Monica, CA,90405
> Tel: 310-392-7041
> Cell: 310-292-1902
> www.ConsumerWatchdog.org <http://www.ConsumerWatchdog.org> 
> john@consumerwatchdog.org <mailto:john@consumerwatchdog.org>
>
Received on Tuesday, 6 March 2012 00:06:36 UTC