- From: Roy T. Fielding <fielding@gbiv.com>
- Date: Wed, 7 Mar 2012 02:34:58 -0800
- To: Rigo Wenning <rigo@w3.org>
- Cc: public-tracking@w3.org, Matthias Schunter <mts@zurich.ibm.com>
On Mar 7, 2012, at 12:50 AM, Rigo Wenning wrote: > On Tuesday 06 March 2012 14:34:43 Roy T. Fielding wrote: >>> As a consequence, a site where each URL may have a different response >>> should live easier with headers; for retrieving the same info from a >>> well-known URI, the whole site needs to be 'mirrored' under the >>> well-known URI and the number of requests would double (Roy: Correct >>> me if I am wrong!). >> >> Actually, you are wrong, though for reasons that very few people would >> anticipate. First, the 'mirror' is not of the site but of the resource >> namespace, and it ends at the first ancestor that has the same tracking >> policy as all of its descendants. Descendants would redirect up. >> Second, there are no sites where every URL has its own tracking policy. >> Finally, in the worst case, the site can simply pick the union of all >> tracking behavior for the site and present that at the single >> /.well-known/dnt --- we do not penalize sites for saying they track more >> than they actually do for a given URI. > > Roy, > > you're contradicting the entire P3P WG here: > http://www.w3.org/TR/P3P11/#ref_file I must be doing something right. ;-) > I've set up the ref_files myself and failed for a site as complex as W3C's. Well, yes, but that's because P3P covers everything and cannot presume to redesign the site. Tracking is quite another story. As I mentioned before, it is extremely rare for a tracking site to have more than one policy per domain. If there is more than one policy, it would either be a delegated control model (meaning hierarchical) or a type-based control model (meaning a URI pattern, like a file extension). Both cases are far easier to handle with a virtual URI space than a single file with a bunch of complicated rules. All of the major web servers have support for URI rewriting based on prefix and regular expressions, and they are fully capable of rewriting /.well-known/dnt/my/path to /my/path,tracking-status if a site really wants to have a separate policy per resource delegated directly to the resource owner's space. > While I can imagine that a simple site sets the response headers in a file to > be downloaded, complex sites will have to define the scope if the response > header and with the above you're only scratching the opening door to a complex > can of worms. Defining is easy, implementing is very hard in this field: > > Definition: some regex will do: > DNT reference files make statements about what DNT feedback value applies to > a given URI. DNT reference files support a simple wildcard character to allow > making statements about regions of URI-space. The character asterisk ('*') is > used to represent a sequence of 0 or more of any character. No other special > characters (such as those found in regular expressions) are supported. > > Implementation: > > In W3C datespace, files with different levels of access are sitting in the > same folder. And different levels of access (logged-in) mean different > tracking status. As a consequence, the file at the well-known location has to > contain a list of all files under DNT policy. For W3C those are some 100k > files or more. How heavy is that file at the well-known location now? How hard > to generate? Being able to log-in or not log-in is not tracking. It is what the site does with the user-id and activity trace that matters. I am willing to bet that the W3C does the exact same thing with its logs and vcs records no matter where in the site those files happen to be. We might want to limit the scope of tracking status resources to non-authenticated requests, regardless, since I see no point in warning the client about tracking after it has already decided to send login credentials. Likewise, can we assume that successful state-changing requests will be tracked? ....Roy
Received on Wednesday, 7 March 2012 10:35:30 UTC