Re: Well known URIs and large sites from Matthias Schunter on 2012-05-09 (public-tracking@w3.org from May 2012)

From: Matthias Schunter <mts-std@schunter.org>
Date: Wed, 09 May 2012 18:05:05 +0200
To: ifette@google.com
CC: "public-tracking@w3.org Group WG" <public-tracking@w3.org>
Message-ID: <4FAA95B1.6080909@schunter.org>
Hi Ian,


thanks for this feedback. I understand that it may be dificult.

Quick questions:
- Would response headers be simple/OK for you to handle?

If this is the case, I see two alternatives:
- one may consider posting a default via a single/fixed well-known URI
(not the whole space) and the provide the specifics via the header.
- one may consider to permit headers stand-alone without info at the
well-known URI.

Opinions/Feedback?


matthias


On 09/05/2012 17:31, Ian Fette (イアンフェッティ) wrote:
> This email is intended to satisfy ACTION-193
>
> The current proposal requires duplicating the entire website's
> namespace under /.well-known/dnt/ -- that is to say, if you
> request https://apis.google.com/_/apps-static/_/js/gapi/googleapis_client,plusone/rt=j/ver=OjdQ3MbDCro.en./sv=1/am=!uchpBK-CNFmZrNLZSw/d=1
> <https://apis.google.com/_/apps-static/_/js/gapi/googleapis_client,plusone/rt=j/ver=OjdQ3MbDCro.en./sv=1/am=%21uchpBK-CNFmZrNLZSw/d=1>
> I have to have a policy file
> under https://apis.google.com/.well-known/dnt/_/apps-static/_/js/gapi/googleapis_client,plusone/rt=j/ver=OjdQ3MbDCro.en./sv=1/am=!uchpBK-CNFmZrNLZSw/d=1
> <https://apis.google.com/.well-known/dnt/_/apps-static/_/js/gapi/googleapis_client,plusone/rt=j/ver=OjdQ3MbDCro.en./sv=1/am=%21uchpBK-CNFmZrNLZSw/d=1>
>
> This is difficult for large sites for a number of reasons. 
>
> 1. Parts of the URL might be used as transitive data, e.g. not
> actually representing an actual file but rather arguments to be passed
> to the server. This essentially means that I need to query whatever
> frontend service handled the original request, and the parameters
> specified as part of the URL may or may not still have meaning at that
> time.
>
> 2. The policy might depend on query parameters which in the current
> draft are not sent, e.g.
> both https://www.google.com/search?source=ig&hl=en&rlz=&q=microsoft&btnG=Google+Search
> <https://www.google.com/search?source=ig&hl=en&rlz=&q=microsoft&btnG=Google+Search>
> and https://www.google.com/search?sugexp=chrome,mod=12&sourceid=chrome&ie=UTF-8&q=microsoft
> <https://www.google.com/search?sugexp=chrome,mod=12&sourceid=chrome&ie=UTF-8&q=microsoft>
> represent searches on Google for "microsoft" but come from different
> sources and therefore may have different logging policies (one came
> from iGoogle, the other from the Chrome omnibox). We may potentially
> need query parameters in this case to figure that out. 
>
> 3. Creating this duplicate namespace now means I've got additional
> mappings/rules for my load balancers / frontends, depending on how
> much flexibility you have this may be a small overhead or if may be
> quite large.
>
> 4. A URL that is used in both first and third party contexts certainly
> has no way of knowing if it was used in a first or third party context
> under the current proposal. (Whether a site can know at all if it is
> 1st/3rd party in any reliable manner is still in the current draft an
> open issue AFAIK though).
>
> What I had proposed in earlier discussions, and what I still maintain
> would be more workable for some large sites, is to instead have the
> request return (perhaps as an alternative to the current well-known
> location proposal) a "policy identifier". That is, the response could
> include something like 'Tk:3,maps' and then if the client cared it
> could fetch /.well-known/dnt/maps to get the policy identified by the
> token "maps". This avoids the problems 1-4 listed above as at the time
> of serving the request, I believe a site has at that point better
> information about what policy applies to the request than being asked
> at a random later point in time at a different address.
>
> -Ian
Received on Wednesday, 9 May 2012 16:05:37 UTC