Fwd: [Ietf-message-headers] Requesting provisional registration for AMP-Cache-Transform header

Thanks for the responses, Mark.

On Wed, Mar 13, 2019 at 9:38 PM Mark Nottingham <mnot@mnot.net> wrote:

> > 1. AFAIK, googlebot currently requests with Accept headers that match
> what browsers commonly send (
> https://www.searchdatalogy.com/blog/googlebots-http-headers/). With the
> addition of application/signed-exchange to Chromium, this remains true (
> https://cs.chromium.org/chromium/src/content/browser/loader/navigation_url_loader_impl.cc?l=237&rcl=7d726373f502fd5e20cede7e22da049b16f77377).
> However, using q-values means googlebot would need to send a significantly
> different Accept header. I fear that this would cause unintentional
> breakage with servers that implement an improper content negotiation
> algorithm.
>
> You're suggesting a new, duplicative content negotiation mechanism be
> created to avoid this fear, but haven't presented any data to support it.
>
> It should be possible to quantify how often origin servers mis-implement
> content negotiation, both by probing the Internet (and Google is in a
> uniquely good place to gather that data), and examining common
> implementations.
>

Sure, happy to explore some common implementations. According to the latest
Netcraft survey
<https://news.netcraft.com/archives/2019/02/28/february-2019-web-server-survey.html>,
the top servers on a couple different axes are Apache, nginx, and IIS.
(There are other ways of ranking servers, but this one was readily
available from a quick search. If you think this is a source of significant
bias, I'd be interested to hear.)

Apache:

   - Has a documented conneg algorithm
   <https://httpd.apache.org/docs/2.4/content-negotiation.html#methods>
   - Supports q-values
   - It also supports server-side preference among multiple variants when
   the client expresses indifference. (see "quality-of-source" or "qs" on that
   page or this one
   <https://httpd.apache.org/docs/2.4/mod/mod_negotiation.html#typemaps>).
   Potential downsides:
      - It appears this makes use of a "type map" format that is specific
      to Apache.
      - This seems incompatible with a mod_proxy setup, and that seems like
      an important use-case. (As one data point, AMP's canonical
      implementation <https://github.com/ampproject/amppackager> of an
      exchange signer is meant to be used behind a revproxy. We could just
      suggest that frontends forward all requests to amppkg regardless
of Accept,
      but that vastly limits its utility, as amppkg has not been optimized to
      serve user-facing traffic. Other implementations may experience similar
      problems; for instance, they may wish for an edge node to deliver cached
      content if possible, and may not wish to host AMP transformation on the
      edge.)
         - For instance, it is unclear if backends can return type maps, or
         they need to be retrieved as files by Apache.
         - Also, it would need to pass the negotiated media type on to the
         backend. For instance, say the client requests `Accept:
         application/signed-exchange;v=b3;amp="google;v=\"3\"";q=0.8,
application/signed-exchange;v=b3;amp="google;v=\"1..2\""`.
         Then the backend needs to know that version 1..2 is being
requested, not 3.
         It could recompute the negotiated media type, but that risks
mismatch with
         what the frontend computed.

nginx:

   - Doesn't support conneg natively.
   - Example configurations
   <https://www.google.com/search?q=nginx+content+negotiation> supporting
   it make use of regexes and scripts that parse Accept only approximately.
   - I couldn't find any that support negotiation on qvalues.

IIS:

   - Supports conneg through a CLR API
   <https://docs.microsoft.com/en-us/aspnet/web-api/overview/formats-and-model-binding/content-negotiation>
   - The default content negotiator:
      - Supports q-values
      - Appears to support server-side preference through the
      MediaTypeMapping
      <https://docs.microsoft.com/en-us/previous-versions/aspnet/hh834723(v%3Dvs.118)>
class's
      TryMatchMediaType
      <https://docs.microsoft.com/en-us/previous-versions/visualstudio/hh835829(v%3Dvs.118)>
abstract
      method.

As far as probing the internet, that's not something I'm inclined to do
unless the benefit well outweighs the (internal and external) costs. Can
you walk me through what you had in mind? Lacking widespread support for
Variants, I'm not sure how I would create an unbiased sample of URLs that
have multiple known variants. (I could look at a biased sample by, e.g.
looking at responses with `Vary: Accept` and then trying a few
known-related media types to discover the variants, before then probing
with trickier Accept headers. But even ignoring the bias, I suspect the
success rate on finding variants would be pretty low.)

Even if some were found, I wonder why you can't work with them -- since
> they're already adopting the format you're defining -- to fix the problem,
> rather than introduce a new mechanism. After all, such a change should only
> affect sites that actually offer the format you're defining, correct?
>

The modified Accept header will be sent on most Googlebot requests, so in
the extreme, it could negatively affect servers that e.g. don't parse
Accept properly. (For a contrived example: `Accept:
application/signed-exchange;v=b3;amp="amp*webp*lace;v=\"1\""` may trip the
webp detector in this example
<https://github.com/cdowdy/Nginx-Content-Negotiation/blob/master/nginx.conf>,
even if the server doesn't support AMP or signed exchanges.)

But excepting that, it's true that the effect is limited to sites wishing
to adopt the new format. Still, our goal is to minimize the barrier to
entry for web developers wishing to adopt it, within the technical
constraints imposed by this space.

It's also a one-off solution. I'm concerned that if every new format
> defines such a mechanism, we're going to be overcome by lots of slightly
> different ways of doing things. From a reverse proxy / CDN standpoint, it's
> better if we can reuse existing mechanisms that don't require special
> handling (some do not allow factoring additional request headers into the
> cache key easily).


I'd be interested for some detail on this. Off-hand, I see that Varnish
supports customizing cache key
<https://varnish-cache.org/docs/trunk/users-guide/vcl-hashing.html>, though
that's where my knowledge on this ends. (I'm unable to find recent
documentation on Squid's Vary support, after a few minutes' searching.)

<snip things I have no update on>

If you do end up needing to specify a new content negotiation mechanism
> here, you really should have a look at Client Hints -- which is turning
> into a framework for doing so. See: <
> https://httpwg.org/http-extensions/draft-ietf-httpbis-client-hints.html>.
> Note that that should be updated to include Variants eventually <
> https://httpwg.org/http-extensions/draft-ietf-httpbis-variants.html>;
> you'll need to specify a variant algorithm as well if you define a new
> conneg mechanism.
>

Thanks! For now, I included a "future work" note about Variants on the
AMP-Cache-Transform doc, and will take a closer look at how Client Hints
could be applicable.

Devin

Received on Thursday, 28 March 2019 18:45:52 UTC