Re: RFC 9113 and :authority header field from Willy Tarreau on 2022-06-29 (ietf-http-wg@w3.org from April to June 2022)

From: Willy Tarreau <w@1wt.eu>
Date: Wed, 29 Jun 2022 07:52:54 +0200
To: Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com>
Cc: HTTP <ietf-http-wg@w3.org>
Message-ID: <20220629055254.GA18881@1wt.eu>
Hi Tatsuhiro,

On Wed, Jun 29, 2022 at 08:58:47AM +0900, Tatsuhiro Tsujikawa wrote:
> RFC 7540 even says that :intermediary MUST omit :authority "when translating
> from an HTTP/1.1 request that has a request target in
> origin or asterisk form (see [RFC7230], Section 5.3)."
> 
> Now RFC 9113 has this text:
> 
>       An intermediary that forwards a request over HTTP/2 MUST construct
>       an ":authority" pseudo-header field using the authority
>       information from the control data of the original request, unless
>       the original request's target URI does not contain authority
>       information (in which case it MUST NOT generate ":authority").
>       Note that the Host header field is not the sole source of this
>       information; see Section 7.2 of [HTTP].
> 
> This means :authority must be included if the host header field exists in
> an HTTP/1.1 request.

My understanding is that Host doesn't necessarily count as "control data"
here, and that the goal was to accurately represent an HTTP/1.x request
targetting an HTTP/1.0 server after being transported over HTTP/2. For
example, let's say that a client passes this to a proxy:

     GET http://example.com/ HTTP/1.0
     Proxy-connection: keep-alive

and nothing more. If instead it gets sent via a gateway that transports
it over H2, it could make sense to consider that the scheme is "http",
the authority is "example.com", that there's no host, hence the request
would be passed as:

     :method: GET
     :scheme: http
     :authority: example.com

and that's all. Conversely, let's see the same HTTP/1.0 request sent
directly to the origin server:

     GET / HTTP/1.0

There's no more authority nor host, so a gateway receiving that cannot
invent one, unless it uses its own configured name corresponding to its
own address, that it expects the client used to construct the request.

With HTTP/1.1 there are less ambiguities since Host is mandatory, but
the distinction between "proxy requests" and origin requests is still
relevant, especially when you don't know whether or not the origin
server supports HTTP/1.1 or only 1.0 (and may be confused by the
presence of an authority in the request line). For example, if a
client sends:

  GET / HTTP/1.1
  Host: example.com

to an HTTP/1.0 server that parses Host, it will work. If it sends

  GET http://example.com/ HTTP/1.1
  Host: example.com

To an HTTP/1.1 server, it will work as well, but it may fail to an HTTP/1.0
server (or worse, loop over itself if it supports proxing requests and
resolves itself as example.com).

If the first request is transported over H2, thus converted from H1 to
H2 then back from H2 to H1, adding an authority that was not initially
present would introduce exactly this problem. By not adding it and using
Host only, the request representation is preserved, and the origin server
can receive the same request that the client took care to encode, and not
be confused. That's why I'm saying that in this case it's clearly visible
that Host isn't part of the "control data" and must not appear in an
authority that was not initially encoded.

I know it's a bit complicated but we have to deal with history. What we're
doing in haproxy is that both Host and :authority are used interchangeably
after having been checked for proper matching, and are modified at the
same time if needed, and we have a flag indicating if an authority was
present in the incoming request to know if we have to produce one on
output or not. That's in the end what seems to preserve the most accurate
representation along a chain of multiple versions. This allows us to emit
a Host field only if one was present, and an authority only if one was
present, regardless of the HTTP version. I don't think that RFC9113 brings
any changes regarding this, it might only be a matter of what constitutes
"control data".

Hoping this helps,
Willy
Received on Wednesday, 29 June 2022 05:53:07 UTC