W3C home > Mailing lists > Public > ietf-http-wg@w3.org > January to March 2019

Re: [Ietf-message-headers] Requesting provisional registration for AMP-Cache-Transform header

From: Mark Nottingham <mnot@mnot.net>
Date: Thu, 14 Mar 2019 15:38:26 +1100
Cc: ietf-http-wg@w3.org
Message-Id: <C8D062B5-C0DB-4578-BB0B-45FEA17F439C@mnot.net>
To: Devin Mullins <twifkak@google.com>
Hi Devin,

Sorry for the delay. Responses below.

> On 12 Feb 2019, at 11:00 am, Devin Mullins <twifkak@google.com> wrote:
> 
> -ietf-message-headers, +ietf-http-wg
> 
> Hi Mark, you're absolutely right; q-values could be used to accomplish that bit. I'd given it a bit of thought in the past, but I'm not sure I really gave it enough thought until now. Here's a couple things that make me hesitant to use q-values:
> 
> 1. AFAIK, googlebot currently requests with Accept headers that match what browsers commonly send (https://www.searchdatalogy.com/blog/googlebots-http-headers/). With the addition of application/signed-exchange to Chromium, this remains true (https://cs.chromium.org/chromium/src/content/browser/loader/navigation_url_loader_impl.cc?l=237&rcl=7d726373f502fd5e20cede7e22da049b16f77377). However, using q-values means googlebot would need to send a significantly different Accept header. I fear that this would cause unintentional breakage with servers that implement an improper content negotiation algorithm.

You're suggesting a new, duplicative content negotiation mechanism be created to avoid this fear, but haven't presented any data to support it. 

It should be possible to quantify how often origin servers mis-implement content negotiation, both by probing the Internet (and Google is in a uniquely good place to gather that data), and examining common implementations. 

Even if some were found, I wonder why you can't work with them -- since they're already adopting the format you're defining -- to fix the problem, rather than introduce a new mechanism. After all, such a change should only affect sites that actually offer the format you're defining, correct?


> 2. Chromium's accept header does not place application/signed-exchange;v=b3 on a lower q-value than other formats. Thus, if publishers wish to serve SXGs crawlers by looking only at the Accept header, they may also inadvertently serve them to browsers, too. Of course, publishers are free to use an algorithm that only selects signed-exchange if its weight is strictly greater than the alternatives. But such a rule isn't implemented today, and wouldn't make sense for most media types; at the point we are suggesting writing new code for a specific case, then the simplest change necessary seems preferable.

Again, data please. Why do you think that origin servers don't implement response selection by qvalue?


> For example, "just test for the presence of an AMP-Cache-Transform header" is doable in most reverse-proxy config languages, and possibly sufficient.

It's also a one-off solution. I'm concerned that if every new format defines such a mechanism, we're going to be overcome by lots of slightly different ways of doing things. From a reverse proxy / CDN standpoint, it's better if we can reuse existing mechanisms that don't require special handling (some do not allow factoring additional request headers into the cache key easily).


>  Alternatively, Chromium could lower the q-value in its own Accept header; I haven't explored this, and may bring it up with the Chrome team to gauge their opinion. But I could see reasons they might prefer not to do that, for use cases other than the "privacy-preserving prefetch" that is a priority to AMP.

I'd be interested to hear the reaction. 


> 3. There's some other data not captured by existing request headers:
> 
> 3a. For privacy-preserving prefetch, googlebot will request a particular variant of the resource with rewritten subresource URLs (https://github.com/ampproject/amphtml/blob/master/spec/amp-cache-transform.md#target-specific-constraints), to match what the AMP Cache does today. So the target of those rewrites (e.g. cdn.ampproject.org) needs to be specified on the HTTP request. (This is, hopefully, only an interim requirement, to be replaced with signed subresource substitution [https://github.com/WICG/webpackage/issues/347] or with bundled exchanges.) I had considered using Referer or Origin to indicate this desired variant, but it seems to me like that's not what these header fields are intended for, and thus there may be unexpected consequences of using it here. Perhaps I'm wrong?

I think it depends on whether Origin (for example) is available and the right value to use in all cases. If it is, that would probably be sufficient -- although I'd bounce the idea against people like Anne van Kesteren and Mike West.


> 3b. Similar to 3a, googlebot will request a particular "version" of the AMP transforms. (These are transformations to be run by the publisher, meant to replicate those currently run by AMP Caches, and address a couple of needs, including speed, privacy, and security.) The version is required to be able to make breaking changes to these transforms. (I suspect, as long as there are changes to AMP component library, there may be corresponding changes to the transforms. But maybe time will show that breaking changes aren't needed, and the requirement can be removed.)

That sounds an awful lot like a media type parameter (or a number of them).


> In both 3a and 3b, we could indicate this on the URL, but my understanding is that doing so would create difficulties for search indexing implementations. It's possible this data could be in the `AMP-Cache-Transform` header while the "prefer SXG" bit is still in Accept, but I'm not sure of the benefits of doing so. (As opposed to the benefits of not introducing a new header at all, which seem a little more clear to me... if the costs can be mitigated.)
> 
> Please let me know if you see a different solution to all these twisty little constraints. Especially as this moves from an AMP-specific criterion to a more general one (https://amphtml.wordpress.com/2018/03/08/standardizing-lessons-learned-from-amp/), it would be nice to minimize the requirements on top of HTTP/HTML that are necessary.

I do agree with the motivation to  minimise additional requirements. I'm not certain that using Accept is going to be the best solution for you; it might be a partial solution, though. 

If you do end up needing to specify a new content negotiation mechanism here, you really should have a look at Client Hints -- which is turning into a framework for doing so. See: <https://httpwg.org/http-extensions/draft-ietf-httpbis-client-hints.html>. Note that that should be updated to include Variants eventually <https://httpwg.org/http-extensions/draft-ietf-httpbis-variants.html>; you'll need to specify a variant algorithm as well if you define a new conneg mechanism.

Cheers,



> 
> Thanks,
> Devin
> 
> On Mon, Feb 4, 2019 at 7:45 PM Mark Nottingham <mnot@mnot.net> wrote:
> Hi Devin,
> 
> Sorry for the delay; I saw this a while ago but didn't look into it deeply. For future reference, you're much more likely to get reviews if you send this to the HTTP WG mailing list (we're in the process of separating HTTP headers out into their own registry, in part to make that more clear).
> 
> Without commenting on the entire document you link to, it says:
> 
> > Therefore, the need arises for the origin to distinguish requests from users and requests from SXG intermediaries. That is, there is a difference between "I can understand the SXG format" and "I prefer an SXG if available". Accept: application/signed-exchange indicates the former. No currently-defined header indicates the latter.
> 
> However, Accept *can* express that; e.g.,
> 
> Accept: application/signed-exchange;q=1.0, */*;q=0.05
> 
> Did you consider using qvalues in the accept header to achieve this?
> 
> Cheers,
> 
> P.S. It may be good to follow up on ietf-http-wg@w3.org, and then report the result back here. I haven't CC:ed them to avoid cross-posting.
> 
> 
> 
> > On 7 Sep 2018, at 10:33 am, Devin Mullins <twifkak=40google.com@dmarc.ietf.org> wrote:
> > 
> > Header field name: AMP-Cache-Transform
> > Applicable protocol: http
> > Status: provisional
> > Author: Devin Mullins <twifkak@google.com>; Google; https://www.ampproject.org/
> > Specification: https://github.com/ampproject/amphtml/blob/master/spec/amp-cache-transform..md
> > 
> > Let me know what other information I can provide.
> > 
> > Thanks!
> > Devin
> > _______________________________________________
> > Ietf-message-headers mailing list
> > Ietf-message-headers@ietf.org
> > https://www.ietf.org/mailman/listinfo/ietf-message-headers
> 
> --
> Mark Nottingham   https://www.mnot.net/
> 

--
Mark Nottingham   https://www.mnot.net/
Received on Thursday, 14 March 2019 04:38:57 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 14 March 2019 04:38:59 UTC