- From: David Schinazi <dschinazi.ietf@gmail.com>
- Date: Wed, 29 Jun 2022 12:42:30 -0700
- To: "Roy T. Fielding" <fielding@gbiv.com>
- Cc: Willy Tarreau <w@1wt.eu>, Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com>, HTTP <ietf-http-wg@w3.org>
- Message-ID: <CAPDSy+41Cu8We=FTZr3PnMq+S1Vw75-YZo4LM3OLZEfnXCE5aQ@mail.gmail.com>
I might be misunderstanding something, but from my reading of RFC 9113, sending an empty :authority pseudo-header with a non-empty Host header is invalid for HTTP requests. That's why google.com rejects these with 400. <<Clients that generate HTTP/2 requests directly MUST use the ":authority" pseudo-header field to convey authority information, unless there is no authority information to convey>> <<An intermediary that forwards a request over HTTP/2 MUST construct an ":authority" pseudo-header field using the authority information from the control data of the original request, unless the original request's target URI does not contain authority information>> Sending a non-empty Host header means that the URI contains authority information, so it triggers the h2 requirement to send :authority. Am I misunderstanding something? David On Wed, Jun 29, 2022 at 10:54 AM Roy T. Fielding <fielding@gbiv.com> wrote: > > On Jun 28, 2022, at 10:52 PM, Willy Tarreau <w@1wt.eu> wrote: > > > > Hi Tatsuhiro, > > > > On Wed, Jun 29, 2022 at 08:58:47AM +0900, Tatsuhiro Tsujikawa wrote: > >> RFC 7540 even says that :intermediary MUST omit :authority "when > translating > >> from an HTTP/1.1 request that has a request target in > >> origin or asterisk form (see [RFC7230], Section 5.3)." > >> > >> Now RFC 9113 has this text: > >> > >> An intermediary that forwards a request over HTTP/2 MUST construct > >> an ":authority" pseudo-header field using the authority > >> information from the control data of the original request, unless > >> the original request's target URI does not contain authority > >> information (in which case it MUST NOT generate ":authority"). > >> Note that the Host header field is not the sole source of this > >> information; see Section 7.2 of [HTTP]. > >> > >> This means :authority must be included if the host header field exists > in > >> an HTTP/1.1 request. > > > > My understanding is that Host doesn't necessarily count as "control data" > > here, and that the goal was to accurately represent an HTTP/1.x request > > targetting an HTTP/1.0 server after being transported over HTTP/2. For > > example, let's say that a client passes this to a proxy: > > > > GET http://example.com/ HTTP/1.0 > > Proxy-connection: keep-alive > > > > and nothing more. If instead it gets sent via a gateway that transports > > it over H2, it could make sense to consider that the scheme is "http", > > the authority is "example.com", that there's no host, hence the request > > would be passed as: > > > > :method: GET > > :scheme: http > > :authority: example.com > > > > and that's all. Conversely, let's see the same HTTP/1.0 request sent > > directly to the origin server: > > > > GET / HTTP/1.0 > > > > There's no more authority nor host, so a gateway receiving that cannot > > invent one, unless it uses its own configured name corresponding to its > > own address, that it expects the client used to construct the request. > > > > With HTTP/1.1 there are less ambiguities since Host is mandatory, but > > the distinction between "proxy requests" and origin requests is still > > relevant, especially when you don't know whether or not the origin > > server supports HTTP/1.1 or only 1.0 (and may be confused by the > > presence of an authority in the request line). For example, if a > > client sends: > > > > GET / HTTP/1.1 > > Host: example.com > > > > to an HTTP/1.0 server that parses Host, it will work. If it sends > > > > GET http://example.com/ HTTP/1.1 > > Host: example.com > > > > To an HTTP/1.1 server, it will work as well, but it may fail to an > HTTP/1.0 > > server (or worse, loop over itself if it supports proxing requests and > > resolves itself as example.com). > > Well, this ship has sailed, but I must have missed that original > discussion. > > The premise is incorrect in all respects, since all of those HTTP/1.1 > requests are also valid HTTP/1.0 requests (even with an absolute URI) > and so is the presence of Host in those requests. > > Host is an HTTP/1.x field that was used in HTTP/1.0 requests (in 1995) > as soon as we reached consensus on the field name. That was long before > 1.1 was finished and 1.0 obsoleted. Host is a required part of HTTP/1.0 now > just by virtue of the Internet as deployed, regardless of the > informational RFC. > > [The idea was originally proposed in 1994 by John Franks > > > https://lists.w3.org/Archives/Public/ietf-http-wg-old/1994SepDec/0019.html > > but it took a long time to converge on a single syntax > > > https://lists.w3.org/Archives/Public/ietf-http-wg-old/1995JanApr/0067.html > > https://lists.w3.org/Archives/Public/ietf-http-wg-old/1995JanApr/0084.html > > https://lists.w3.org/Archives/Public/ietf-http-wg-old/1995JanApr/0130.html > > https://lists.w3.org/Archives/Public/ietf-http-wg-old/1995SepDec/0291.html > > and while we still talk about it as an important addition of HTTP/1.1 > (because > that's where we chose to document it), the feature is required for 1.0 to > work with deployed servers.] > > So, an HTTP proxy recipient that receives any form of authority/host > information must forward that information in either Host or :authority, > no matter what version it is using. Failure to do so introduces a > security bypass because L7 routers act on that information whether > or not the client/server pair is aware of their presence. > > Hence, an HTTP/1.0 proxy that receives your first example should forward > that as > > GET / HTTP/1.0 > Host: example.com > Proxy-connection: keep-alive > > because the routing doesn't work otherwise due to name-based hosts > being deployed before HTTP/1.1. > > And, no, there is absolutely no reason to concern ourselves with proxies > that loop over their own hostnames, since that is a self-correcting error > whenever a full URI is received as the request target. > > > If the first request is transported over H2, thus converted from H1 to > > H2 then back from H2 to H1, adding an authority that was not initially > > present would introduce exactly this problem. By not adding it and using > > Host only, the request representation is preserved, and the origin server > > can receive the same request that the client took care to encode, and not > > be confused. That's why I'm saying that in this case it's clearly visible > > that Host isn't part of the "control data" and must not appear in an > > authority that was not initially encoded. > > > > I know it's a bit complicated but we have to deal with history. What > we're > > doing in haproxy is that both Host and :authority are used > interchangeably > > after having been checked for proper matching, and are modified at the > > same time if needed, and we have a flag indicating if an authority was > > present in the incoming request to know if we have to produce one on > > output or not. That's in the end what seems to preserve the most accurate > > representation along a chain of multiple versions. This allows us to emit > > a Host field only if one was present, and an authority only if one was > > present, regardless of the HTTP version. I don't think that RFC9113 > brings > > any changes regarding this, it might only be a matter of what constitutes > > "control data". > > Sorry, that is a broken implementation. You need to send Host regardless > of the original request version. > > ....Roy > > >
Received on Wednesday, 29 June 2022 19:42:55 UTC