- From: Willy Tarreau <w@1wt.eu>
- Date: Wed, 29 Jun 2022 07:52:54 +0200
- To: Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com>
- Cc: HTTP <ietf-http-wg@w3.org>
Hi Tatsuhiro, On Wed, Jun 29, 2022 at 08:58:47AM +0900, Tatsuhiro Tsujikawa wrote: > RFC 7540 even says that :intermediary MUST omit :authority "when translating > from an HTTP/1.1 request that has a request target in > origin or asterisk form (see [RFC7230], Section 5.3)." > > Now RFC 9113 has this text: > > An intermediary that forwards a request over HTTP/2 MUST construct > an ":authority" pseudo-header field using the authority > information from the control data of the original request, unless > the original request's target URI does not contain authority > information (in which case it MUST NOT generate ":authority"). > Note that the Host header field is not the sole source of this > information; see Section 7.2 of [HTTP]. > > This means :authority must be included if the host header field exists in > an HTTP/1.1 request. My understanding is that Host doesn't necessarily count as "control data" here, and that the goal was to accurately represent an HTTP/1.x request targetting an HTTP/1.0 server after being transported over HTTP/2. For example, let's say that a client passes this to a proxy: GET http://example.com/ HTTP/1.0 Proxy-connection: keep-alive and nothing more. If instead it gets sent via a gateway that transports it over H2, it could make sense to consider that the scheme is "http", the authority is "example.com", that there's no host, hence the request would be passed as: :method: GET :scheme: http :authority: example.com and that's all. Conversely, let's see the same HTTP/1.0 request sent directly to the origin server: GET / HTTP/1.0 There's no more authority nor host, so a gateway receiving that cannot invent one, unless it uses its own configured name corresponding to its own address, that it expects the client used to construct the request. With HTTP/1.1 there are less ambiguities since Host is mandatory, but the distinction between "proxy requests" and origin requests is still relevant, especially when you don't know whether or not the origin server supports HTTP/1.1 or only 1.0 (and may be confused by the presence of an authority in the request line). For example, if a client sends: GET / HTTP/1.1 Host: example.com to an HTTP/1.0 server that parses Host, it will work. If it sends GET http://example.com/ HTTP/1.1 Host: example.com To an HTTP/1.1 server, it will work as well, but it may fail to an HTTP/1.0 server (or worse, loop over itself if it supports proxing requests and resolves itself as example.com). If the first request is transported over H2, thus converted from H1 to H2 then back from H2 to H1, adding an authority that was not initially present would introduce exactly this problem. By not adding it and using Host only, the request representation is preserved, and the origin server can receive the same request that the client took care to encode, and not be confused. That's why I'm saying that in this case it's clearly visible that Host isn't part of the "control data" and must not appear in an authority that was not initially encoded. I know it's a bit complicated but we have to deal with history. What we're doing in haproxy is that both Host and :authority are used interchangeably after having been checked for proper matching, and are modified at the same time if needed, and we have a flag indicating if an authority was present in the incoming request to know if we have to produce one on output or not. That's in the end what seems to preserve the most accurate representation along a chain of multiple versions. This allows us to emit a Host field only if one was present, and an authority only if one was present, regardless of the HTTP version. I don't think that RFC9113 brings any changes regarding this, it might only be a matter of what constitutes "control data". Hoping this helps, Willy
Received on Wednesday, 29 June 2022 05:53:07 UTC