Re: RFC 9113 and :authority header field from Stefan Eissing on 2022-06-29 (ietf-http-wg@w3.org from April to June 2022)

From: Stefan Eissing <stefan@eissing.org>
Date: Wed, 29 Jun 2022 11:18:19 +0200
To: Willy Tarreau <w@1wt.eu>
Cc: "tatsuhiro.t@gmail.com" <tatsuhiro.t@gmail.com>, HTTP <ietf-http-wg@w3.org>
Message-Id: <697C5255-A33F-4DEE-AA7A-236DC7481EFA@eissing.org>
> Am 29.06.2022 um 07:52 schrieb Willy Tarreau <w@1wt.eu>:
> 
> Hi Tatsuhiro,
> 
> On Wed, Jun 29, 2022 at 08:58:47AM +0900, Tatsuhiro Tsujikawa wrote:
>> RFC 7540 even says that :intermediary MUST omit :authority "when translating
>> from an HTTP/1.1 request that has a request target in
>> origin or asterisk form (see [RFC7230], Section 5.3)."
>> 
>> Now RFC 9113 has this text:
>> 
>>      An intermediary that forwards a request over HTTP/2 MUST construct
>>      an ":authority" pseudo-header field using the authority
>>      information from the control data of the original request, unless
>>      the original request's target URI does not contain authority
>>      information (in which case it MUST NOT generate ":authority").
>>      Note that the Host header field is not the sole source of this
>>      information; see Section 7.2 of [HTTP].
>> 
>> This means :authority must be included if the host header field exists in
>> an HTTP/1.1 request.
> 
> My understanding is that Host doesn't necessarily count as "control data"
> here, and that the goal was to accurately represent an HTTP/1.x request
> targetting an HTTP/1.0 server after being transported over HTTP/2. For
> example, let's say that a client passes this to a proxy:
> 
>     GET http://example.com/ HTTP/1.0
>     Proxy-connection: keep-alive
> 
> and nothing more. If instead it gets sent via a gateway that transports
> it over H2, it could make sense to consider that the scheme is "http",
> the authority is "example.com", that there's no host, hence the request
> would be passed as:
> 
>     :method: GET
>     :scheme: http
>     :authority: example.com
> 
> and that's all. Conversely, let's see the same HTTP/1.0 request sent
> directly to the origin server:
> 
>     GET / HTTP/1.0
> 
> There's no more authority nor host, so a gateway receiving that cannot
> invent one, unless it uses its own configured name corresponding to its
> own address, that it expects the client used to construct the request.
> 
> With HTTP/1.1 there are less ambiguities since Host is mandatory, but
> the distinction between "proxy requests" and origin requests is still
> relevant, especially when you don't know whether or not the origin
> server supports HTTP/1.1 or only 1.0 (and may be confused by the
> presence of an authority in the request line). For example, if a
> client sends:
> 
>  GET / HTTP/1.1
>  Host: example.com
> 
> to an HTTP/1.0 server that parses Host, it will work. If it sends
> 
>  GET http://example.com/ HTTP/1.1
>  Host: example.com
> 
> To an HTTP/1.1 server, it will work as well, but it may fail to an HTTP/1.0
> server (or worse, loop over itself if it supports proxing requests and
> resolves itself as example.com).
> 
> If the first request is transported over H2, thus converted from H1 to
> H2 then back from H2 to H1, adding an authority that was not initially
> present would introduce exactly this problem. By not adding it and using
> Host only, the request representation is preserved, and the origin server
> can receive the same request that the client took care to encode, and not
> be confused. That's why I'm saying that in this case it's clearly visible
> that Host isn't part of the "control data" and must not appear in an
> authority that was not initially encoded.
> 
> I know it's a bit complicated but we have to deal with history. What we're
> doing in haproxy is that both Host and :authority are used interchangeably
> after having been checked for proper matching, and are modified at the
> same time if needed, and we have a flag indicating if an authority was
> present in the incoming request to know if we have to produce one on
> output or not. That's in the end what seems to preserve the most accurate
> representation along a chain of multiple versions. This allows us to emit
> a Host field only if one was present, and an authority only if one was
> present, regardless of the HTTP version. I don't think that RFC9113 brings
> any changes regarding this, it might only be a matter of what constitutes
> "control data".
> 
> Hoping this helps,

Thanks. Trying to put all this in examples:

MUST WORK:
H2 :scheme: https, :path: /, Host: example.com
H2 :authority: example.com, :scheme: https, :path: /, Host: example.com:443

SHOULD FAIL:
H2 :authority: example.com, :scheme: https, :path: /, Host: badexample.com
  if not, become H1 GET / HTTP/1.1, Host: example.com

MUST FAIL:
H2 :scheme: https, :path: /


Conversions:

H1 GET / HTTP/1.1, Host: example.com
-> H2 :authority: example.com, :scheme: <context>
  -> H0 GET http://example.com/ HTTP/1.0
  -> H1 GET / HTTP/1.1, Host: example.com

H1 GET http://example.com/ HTTP/1.1, Host: example.com
-> H2 :authority: example.com, :scheme: http, Host: example.com
  -> H0 GET http://example.com/ HTTP/1.0
  -> H1 GET / HTTP/1.1, Host: example.com

H1 GET http://example.com/ HTTP/1.0
-> H2 :authority: example.com, :scheme: http
  -> H0 GET http://example.com/ HTTP/1.0
  -> H1 GET / HTTP/1.1, Host: example.com

H1 GET / HTTP/1.0
-> H2 :authority: <context>, :scheme: <context>

H1 GET urn:ietf:std:97 HTTP/1.1, Host:
-> H2 :authority: ???, :scheme: urn, :path: ???


Kind Regards,
Stefan
Received on Wednesday, 29 June 2022 09:18:36 UTC