W3C home > Mailing lists > Public > ietf-http-wg@w3.org > January to March 2019

Re: [Ietf-message-headers] Requesting provisional registration for AMP-Cache-Transform header

From: Devin Mullins <twifkak@google.com>
Date: Tue, 12 Feb 2019 00:03:09 +0000
Cc: ietf-http-wg@w3.org
Message-Id: <CANjwSinahx_ytO8XO1hZ-ECuwRP22U0rMP9M=0VnQYUnQcJx1Q@mail.gmail.com>
To: Mark Nottingham <mnot@mnot.net>
-ietf-message-headers, +ietf-http-wg

Hi Mark, you're absolutely right; q-values could be used to accomplish that bit. I'd given it a bit of thought in the past, but I'm not sure I really gave it enough thought until now. Here's a couple things that make me hesitant to use q-values:

1. AFAIK, googlebot currently requests with Accept headers that match what browsers commonly send (https://www.searchdatalogy.com/blog/googlebots-http-headers/ <https://www..searchdatalogy.com/blog/googlebots-http-headers/>). With the addition of application/signed-exchange to Chromium, this remains true (https://cs.chromium.org/chromium/src/content/browser/loader/navigation_url_loader_impl.cc?l=237&rcl=7d726373f502fd5e20cede7e22da049b16f77377 <https://cs.chromium.org/chromium/src/content/browser/loader/navigation_url_loader_impl.cc?l=237&rcl=7d726373f502fd5e20cede7e22da049b16f77377>). However, using q-values means googlebot would need to send a significantly different Accept header. I fear that this would cause unintentional breakage with servers that implement an improper content negotiation algorithm.

2. Chromium's accept header does not place application/signed-exchange;v=b3 on a lower q-value than other formats. Thus, if publishers wish to serve SXGs crawlers by looking only at the Accept header, they may also inadvertently serve them to browsers, too. Of course, publishers are free to use an algorithm that only selects signed-exchange if its weight is strictly greater than the alternatives. But such a rule isn't implemented today, and wouldn't make sense for most media types; at the point we are suggesting writing new code for a specific case, then the simplest change necessary seems preferable. For example, "just test for the presence of an AMP-Cache-Transform header" is doable in most reverse-proxy config languages, and possibly sufficient. Alternatively, Chromium could lower the q-value in its own Accept header; I haven't explored this, and may bring it up with the Chrome team to gauge their opinion. But I could see reasons they might prefer not to do that, for use cases other than the "privacy-preserving prefetch" that is a priority to AMP.

3. There's some other data not captured by existing request headers:

3a. For privacy-preserving prefetch, googlebot will request a particular variant of the resource with rewritten subresource URLs (https://github.com/ampproject/amphtml/blob/master/spec/amp-cache-transform.md#target-specific-constraints <https://github.com/ampproject/amphtml/blob/master/spec/amp-cache-transform.md#target-specific-constraints>), to match what the AMP Cache does today. So the target of those rewrites (e.g. cdn.ampproject.org <http://cdn.ampproject.org/>) needs to be specified on the HTTP request. (This is, hopefully, only an interim requirement, to be replaced with signed subresource substitution [https://github.com/WICG/webpackage/issues/347 <https://github.com/WICG/webpackage/issues/347>] or with bundled exchanges.) I had considered using Referer or Origin to indicate this desired variant, but it seems to me like that's not what these header fields are intended for, and thus there may be unexpected consequences of using it here. Perhaps I'm wrong?

3b. Similar to 3a, googlebot will request a particular "version" of the AMP transforms. (These are transformations to be run by the publisher, meant to replicate those currently run by AMP Caches, and address a couple of needs, including speed, privacy, and security.) The version is required to be able to make breaking changes to these transforms. (I suspect, as long as there are changes to AMP component library, there may be corresponding changes to the transforms. But maybe time will show that breaking changes aren't needed, and the requirement can be removed.)

In both 3a and 3b, we could indicate this on the URL, but my understanding is that doing so would create difficulties for search indexing implementations. It's possible this data could be in the `AMP-Cache-Transform` header while the "prefer SXG" bit is still in Accept, but I'm not sure of the benefits of doing so. (As opposed to the benefits of not introducing a new header at all, which seem a little more clear to me... if the costs can be mitigated.)

Please let me know if you see a different solution to all these twisty little constraints. Especially as this moves from an AMP-specific criterion to a more general one (https://amphtml.wordpress.com/2018/03/08/standardizing-lessons-learned-from-amp/ <https://amphtml.wordpress.com/2018/03/08/standardizing-lessons-learned-from-amp/>), it would be nice to minimize the requirements on top of HTTP/HTML that are necessary.

Thanks,
Devin

On Mon, Feb 4, 2019 at 7:45 PM Mark Nottingham <mnot@mnot.net <mailto:mnot@mnot.net>> wrote:
Hi Devin,

Sorry for the delay; I saw this a while ago but didn't look into it deeply. For future reference, you're much more likely to get reviews if you send this to the HTTP WG mailing list (we're in the process of separating HTTP headers out into their own registry, in part to make that more clear).

Without commenting on the entire document you link to, it says:

> Therefore, the need arises for the origin to distinguish requests from users and requests from SXG intermediaries. That is, there is a difference between "I can understand the SXG format" and "I prefer an SXG if available". Accept: application/signed-exchange indicates the former. No currently-defined header indicates the latter.

However, Accept *can* express that; e.g.,

Accept: application/signed-exchange;q=1.0, */*;q=0.05

Did you consider using qvalues in the accept header to achieve this?

Cheers,

P.S. It may be good to follow up on ietf-http-wg@w3.org <mailto:ietf-http-wg@w3.org>, and then report the result back here. I haven't CC:ed them to avoid cross-posting.



> On 7 Sep 2018, at 10:33 am, Devin Mullins <twifkak=40google.com@dmarc.ietf.org <mailto:40google.com@dmarc.ietf.org>> wrote:
> 
> Header field name: AMP-Cache-Transform
> Applicable protocol: http
> Status: provisional
> Author: Devin Mullins <twifkak@google.com <mailto:twifkak@google.com>>; Google; https://www.ampproject.org/ <https://www.ampproject.org/>
> Specification: https://github.com/ampproject/amphtml/blob/master/spec/amp-cache-transform...md <https://github.com/ampproject/amphtml/blob/master/spec/amp-cache-transform..md>
> 
> Let me know what other information I can provide.
> 
> Thanks!
> Devin
> _______________________________________________
> Ietf-message-headers mailing list
> Ietf-message-headers@ietf.org <mailto:Ietf-message-headers@ietf.org>
> https://www.ietf.org/mailman/listinfo/ietf-message-headers <https://www.ietf.org/mailman/listinfo/ietf-message-headers>

--
Mark Nottingham   https://www.mnot.net/ <https://www.mnot.net/>
Received on Sunday, 24 February 2019 21:30:25 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 24 February 2019 21:30:27 UTC