- From: Devin Mullins <twifkak@google.com>
- Date: Thu, 28 Mar 2019 11:45:00 -0700
- To: Mark Nottingham <mnot@mnot.net>, ietf-http-wg@w3.org
- Message-ID: <CANjwSik+HRKRQUkhnKeoR9+MrQGxbOTgcvBSvu+HygUHNB6rjQ@mail.gmail.com>
Thanks for the responses, Mark. On Wed, Mar 13, 2019 at 9:38 PM Mark Nottingham <mnot@mnot.net> wrote: > > 1. AFAIK, googlebot currently requests with Accept headers that match > what browsers commonly send ( > https://www.searchdatalogy.com/blog/googlebots-http-headers/). With the > addition of application/signed-exchange to Chromium, this remains true ( > https://cs.chromium.org/chromium/src/content/browser/loader/navigation_url_loader_impl.cc?l=237&rcl=7d726373f502fd5e20cede7e22da049b16f77377). > However, using q-values means googlebot would need to send a significantly > different Accept header. I fear that this would cause unintentional > breakage with servers that implement an improper content negotiation > algorithm. > > You're suggesting a new, duplicative content negotiation mechanism be > created to avoid this fear, but haven't presented any data to support it. > > It should be possible to quantify how often origin servers mis-implement > content negotiation, both by probing the Internet (and Google is in a > uniquely good place to gather that data), and examining common > implementations. > Sure, happy to explore some common implementations. According to the latest Netcraft survey <https://news.netcraft.com/archives/2019/02/28/february-2019-web-server-survey.html>, the top servers on a couple different axes are Apache, nginx, and IIS. (There are other ways of ranking servers, but this one was readily available from a quick search. If you think this is a source of significant bias, I'd be interested to hear.) Apache: - Has a documented conneg algorithm <https://httpd.apache.org/docs/2.4/content-negotiation.html#methods> - Supports q-values - It also supports server-side preference among multiple variants when the client expresses indifference. (see "quality-of-source" or "qs" on that page or this one <https://httpd.apache.org/docs/2.4/mod/mod_negotiation.html#typemaps>). Potential downsides: - It appears this makes use of a "type map" format that is specific to Apache. - This seems incompatible with a mod_proxy setup, and that seems like an important use-case. (As one data point, AMP's canonical implementation <https://github.com/ampproject/amppackager> of an exchange signer is meant to be used behind a revproxy. We could just suggest that frontends forward all requests to amppkg regardless of Accept, but that vastly limits its utility, as amppkg has not been optimized to serve user-facing traffic. Other implementations may experience similar problems; for instance, they may wish for an edge node to deliver cached content if possible, and may not wish to host AMP transformation on the edge.) - For instance, it is unclear if backends can return type maps, or they need to be retrieved as files by Apache. - Also, it would need to pass the negotiated media type on to the backend. For instance, say the client requests `Accept: application/signed-exchange;v=b3;amp="google;v=\"3\"";q=0.8, application/signed-exchange;v=b3;amp="google;v=\"1..2\""`. Then the backend needs to know that version 1..2 is being requested, not 3. It could recompute the negotiated media type, but that risks mismatch with what the frontend computed. nginx: - Doesn't support conneg natively. - Example configurations <https://www.google.com/search?q=nginx+content+negotiation> supporting it make use of regexes and scripts that parse Accept only approximately. - I couldn't find any that support negotiation on qvalues. IIS: - Supports conneg through a CLR API <https://docs.microsoft.com/en-us/aspnet/web-api/overview/formats-and-model-binding/content-negotiation> - The default content negotiator: - Supports q-values - Appears to support server-side preference through the MediaTypeMapping <https://docs.microsoft.com/en-us/previous-versions/aspnet/hh834723(v%3Dvs.118)> class's TryMatchMediaType <https://docs.microsoft.com/en-us/previous-versions/visualstudio/hh835829(v%3Dvs.118)> abstract method. As far as probing the internet, that's not something I'm inclined to do unless the benefit well outweighs the (internal and external) costs. Can you walk me through what you had in mind? Lacking widespread support for Variants, I'm not sure how I would create an unbiased sample of URLs that have multiple known variants. (I could look at a biased sample by, e.g. looking at responses with `Vary: Accept` and then trying a few known-related media types to discover the variants, before then probing with trickier Accept headers. But even ignoring the bias, I suspect the success rate on finding variants would be pretty low.) Even if some were found, I wonder why you can't work with them -- since > they're already adopting the format you're defining -- to fix the problem, > rather than introduce a new mechanism. After all, such a change should only > affect sites that actually offer the format you're defining, correct? > The modified Accept header will be sent on most Googlebot requests, so in the extreme, it could negatively affect servers that e.g. don't parse Accept properly. (For a contrived example: `Accept: application/signed-exchange;v=b3;amp="amp*webp*lace;v=\"1\""` may trip the webp detector in this example <https://github.com/cdowdy/Nginx-Content-Negotiation/blob/master/nginx.conf>, even if the server doesn't support AMP or signed exchanges.) But excepting that, it's true that the effect is limited to sites wishing to adopt the new format. Still, our goal is to minimize the barrier to entry for web developers wishing to adopt it, within the technical constraints imposed by this space. It's also a one-off solution. I'm concerned that if every new format > defines such a mechanism, we're going to be overcome by lots of slightly > different ways of doing things. From a reverse proxy / CDN standpoint, it's > better if we can reuse existing mechanisms that don't require special > handling (some do not allow factoring additional request headers into the > cache key easily). I'd be interested for some detail on this. Off-hand, I see that Varnish supports customizing cache key <https://varnish-cache.org/docs/trunk/users-guide/vcl-hashing.html>, though that's where my knowledge on this ends. (I'm unable to find recent documentation on Squid's Vary support, after a few minutes' searching.) <snip things I have no update on> If you do end up needing to specify a new content negotiation mechanism > here, you really should have a look at Client Hints -- which is turning > into a framework for doing so. See: < > https://httpwg.org/http-extensions/draft-ietf-httpbis-client-hints.html>. > Note that that should be updated to include Variants eventually < > https://httpwg.org/http-extensions/draft-ietf-httpbis-variants.html>; > you'll need to specify a variant algorithm as well if you define a new > conneg mechanism. > Thanks! For now, I included a "future work" note about Variants on the AMP-Cache-Transform doc, and will take a closer look at how Client Hints could be applicable. Devin
Received on Thursday, 28 March 2019 18:45:52 UTC