Re: Initial feedback on the well-known URI Proposal

On Mar 7, 2012, at 12:50 AM, Rigo Wenning wrote:

> On Tuesday 06 March 2012 14:34:43 Roy T. Fielding wrote:
>>> As a consequence, a site where each URL may have a different response
>>> should live easier with headers; for retrieving the same info from a
>>> well-known URI, the whole site needs to be 'mirrored' under the
>>> well-known URI and the number of requests would double (Roy: Correct
>>> me if I am wrong!).
>> 
>> Actually, you are wrong, though for reasons that very few people would
>> anticipate.  First, the 'mirror' is not of the site but of the resource
>> namespace, and it ends at the first ancestor that has the same tracking
>> policy as all of its descendants.  Descendants would redirect up.
>> Second, there are no sites where every URL has its own tracking policy.
>> Finally, in the worst case, the site can simply pick the union of all
>> tracking behavior for the site and present that at the single
>> /.well-known/dnt --- we do not penalize sites for saying they track more
>> than they actually do for a given URI.
> 
> Roy, 
> 
> you're contradicting the entire P3P WG here:
> http://www.w3.org/TR/P3P11/#ref_file

I must be doing something right. ;-)

> I've set up the ref_files myself and failed for a site as complex as W3C's. 

Well, yes, but that's because P3P covers everything and cannot presume
to redesign the site.  Tracking is quite another story.  As I mentioned
before, it is extremely rare for a tracking site to have more than one
policy per domain.  If there is more than one policy, it would either be
a delegated control model (meaning hierarchical) or a type-based control
model (meaning a URI pattern, like a file extension).  Both cases are
far easier to handle with a virtual URI space than a single file with
a bunch of complicated rules.

All of the major web servers have support for URI rewriting based on
prefix and regular expressions, and they are fully capable of rewriting

    /.well-known/dnt/my/path
to
    /my/path,tracking-status

if a site really wants to have a separate policy per resource delegated
directly to the resource owner's space.

> While I can imagine that a simple site sets the response headers in a file to 
> be downloaded, complex sites will have to define the scope if the response 
> header and with the above you're only scratching the opening door to a complex 
> can of worms. Defining is easy, implementing is very hard in this field: 
> 
> Definition: some regex will do:
>  DNT reference files make statements about what DNT feedback value applies to 
> a given URI. DNT reference files support a simple wildcard character to allow 
> making statements about regions of URI-space. The character asterisk ('*') is 
> used to represent a sequence of 0 or more of any character. No other special 
> characters (such as those found in regular expressions) are supported.
> 
> Implementation: 
> 
> In W3C datespace, files with different levels of access are sitting in the 
> same folder. And different levels of access (logged-in) mean different 
> tracking status. As a consequence, the file at the well-known location has to 
> contain a list of all files under DNT policy. For W3C those are some 100k 
> files or more. How heavy is that file at the well-known location now? How hard 
> to generate?

Being able to log-in or not log-in is not tracking.  It is what the site
does with the user-id and activity trace that matters.  I am willing
to bet that the W3C does the exact same thing with its logs and vcs
records no matter where in the site those files happen to be.

We might want to limit the scope of tracking status resources
to non-authenticated requests, regardless, since I see no point in
warning the client about tracking after it has already decided to
send login credentials.  Likewise, can we assume that successful
state-changing requests will be tracked?

....Roy

Received on Wednesday, 7 March 2012 10:35:30 UTC