- From: Willy Tarreau <w@1wt.eu>
- Date: Fri, 3 Sep 2021 07:55:18 +0200
- To: "Roy T. Fielding" <fielding@gbiv.com>
- Cc: HTTP Working Group <ietf-http-wg@w3.org>, Martin Thomson <mt@lowentropy.net>
Hi Roy, On Thu, Sep 02, 2021 at 10:47:58AM -0700, Roy T. Fielding wrote: > On Sep 1, 2021, at 6:05 AM, Willy Tarreau <w@1wt.eu> wrote: > > > > On Wed, Sep 01, 2021 at 08:35:49AM +0200, Julian Reschke wrote: > >> To call "Host" > >> mandatoy is just confusing. Maybe the whole statement should be > >> simplified somewhat. > > > > It's possible, nothing comes to my mind right now. The reality is > > that Host has become a second place for authority long ago but is > > weaker than authority... In an ideal world we'd say that an authority > > is mandatory and must be provided as :authority or Host (or both), > > preferably :authority. But with the split inherited from H1 and the > > need to pass H1 semantics to H2 it's not that easy. > > Well, it could have been that easy, but that's water under the bridge. > > I think the rationale was to be able to send a request across versions > > h1 h2 h1 > A -> B -> C -> D > > and have both h1 requests look the same by preserving Host instead > of :authority. Is that right? Maybe the spec should explain it. I'm fine with clarifying something for this. We already support various combinations of host/authority in haproxy in order to preserve as much as possible the origin vs absolute form for H1, with the exception that H2->H1 always switches to origin form due to the ambiguity in H2 that doesn't provide the distinction between the two forms. But it could be stated that: - origin forms have either authority or host - absolute forms have both And then this removes one ambiguity. However I'm pretty sure I've already seen quite a bunch of H2 requests coming with both, probably because that was easy to do for the client and that no distinction was ever proposed between them. > Personally, I would have preferred that Host be deleted in passing > and :authority be used consistently to generate Host only when > forwarding to HTTP/1.1. That's what we currently do, indeed, since we do not have a reliable signal to say "this H2 request was in absolute form, please do that for H1 as well". > HTTP/1.1 (sec 3.2.2) requires that a message to a proxy be in absolute > form: Yep. > and that a received Host be replaced with the :authority > > A client MUST send a Host header field in an HTTP/1.1 request even if > the request-target is in the absolute-form, since this allows the > Host information to be forwarded through ancient HTTP/1.0 proxies > that might not have implemented Host. I'm seeing a concerning change here from 7230 that I did not notice previously, the requirement that the Host value MUST be identical to the authority: If the target URI includes an authority component, then a client MUST send a field-value for Host that is identical to that authority component. This MUST combined with the last paragraph: A server MUST respond with a 400 (Bad Request) status code to any HTTP/1.1 request message that lacks a Host header field and to any request message that contains more than one Host header field or a Host header field with an invalid field-value. allowed to reject requests with mismatching Host vs authority, which is the basis of the security issues. And now I'm not seeing it that clear anymore. > When a proxy receives a request with an absolute-form of request- > target, the proxy MUST ignore the received Host header field (if any) > and instead replace it with the host information of the request- > target. > A proxy that forwards such a request MUST generate a new > Host field value based on the received request-target rather than > forward the received Host field value. > > When an origin server receives a request with an absolute-form of > request-target, the origin server MUST ignore the received Host > header field (if any) and instead use the host information of the > request-target. Note that if the request-target does not have an > authority component, an empty Host header field will be sent in this > case. > > Note that it does not place those requirements on intermediaries, > in general, because gateways/CDNs are allowed/expected to modify > the request target when it is being forwarded. Sure but a number of intermediaries can be arbitrarily placed in front of either proxies or servers. And a number of proxies nowadays also support working as reverse-proxies since they can find all the info they need in the request. I suspect that by virtue of staying compatible with HTTP/1.0 we've enumerated more real cases than really exist from the combinations of authority vs host. Let me try to enumerate the type of requests that come to my mind (and let's see which ones are forgotten, I'm purposely leaving CONNECT aside): - HTTP/1.0 requests in origin form without Host. These ones are for a server and they provide no authority either. The server or intermediary that receives them may only rely on the target IP:port to figure what they were for. - HTTP/1.0 requests in absolute form without Host. These ones are explicitly for a proxy and do provide an authority. Any server or intermediary can figure the authority from the URI, so an intermediary may transform them to HTTP/1.1 on the other side and provide an matching Host header field if needed (e.g. a proxy may coalesce them on a reused idle HTTP/1.1 connection to the same host). - HTTP/1.0 requests in origin form with Host. A Host information was provided and the recipient is expected to use that as the only source of authority if needed in order to determine a virtual host. These ones are equivalent to HTTP/1.1 ones it seems. - HTTP/1.0 requests in absolute form with Host. These ones are equivalent to HTTP/1.1 ones it seems, with the possible exception that since Host was not mandatory in 1.0, Host is not guaranted to convery valid information, thus it makes sense to prefer the authority part from the URI and ignore Host. - HTTP/1.1 requests in origin form with empty Host. Host is present but has no value, similar to HTTP/1.0 above. The recipient will use connection-level info if desired. - HTTP/1.1 requests in origin form with non-empty Host. The recipient must use the Host field value as the sole source of authority info. - HTTP/1.1 requests in absolute form with empty Host. These are similar to HTTP/1.0 absolute requests. The authority is extracted from the URI. - HTTP/1.1 requests in absolute form with non-empty Host. This one was produced by an HTTP/1.1 agent which complies with 7230, so Host MUST match the authority in the URI. Any of them may be used as the source of authority, and either of them may be dropped along the path without losing information. - HTTP/2 requests with no :authority and no Host. These are not valid. - HTTP/2 requests with no :authority and empty Host. These are similar to HTTP/1.0 absolute above. - HTTP/2 requests with no :authority and non-empty Host. These are similar to HTTP/1.1 origin + Host above. Authority comes from Host. - HTTP/2 requests with :authority and no Host. They're similar to HTTP/1.0 in absolute form (i.e. authority comes from :authority) - HTTP/2 requests with :authority and empty Host. Same as the if ther is no Host. - HTTP/2 requests with :authority and non-empty Host. These are similar to HTTP/1.1 absolute + Host above. The sender had to respect the "MUST be identical" rule, so either can be used and the other one dropped. Note that they're pretty common in field from what I've seen (no numbers), which is a difference compared to HTTP/1.1. As such it seems to me that these are the situations we are trying to cover: - HTTP/1.0 in origin form and no (or empty) Host, HTTP/1.1 in origin form and empty Host, HTTP/2 with no authority and empty Host => authority unknown, decide based on the connection-level info - HTTP/{1.0,1.1} in origin form and valid Host, HTTP/2 with no :authority but valid Host => authority comes from Host - HTTP/1.0 in absolute form with or without Host => authority comes from URI (host ignored) - HTTP/1.1 in absolute form with empty or non-empty Host => authority comes from URI (host ignored) - HTTP/2 with both :authority and Host => authority comes from :authority (Host ignored) And in any case, with HTTP/1.1 and above, a non-empty Host must always match any provided authority. Once we have this rules in place, an intermediary can safely convert any version into any other one and emit the outgoing request according to the target protocol's preferred conventions. It's up to the intermediary to decide whether to emit an HTTP/1.x request as an origin request or an absolute one depending on configuration or hints (in haproxy we try to mimmick what we've observed on the other side, but I would also find it normal to have it by configuration since the notion of proxy on the next hop is very connection-centric). > http2bis 8.3.1 says: > > Clients that generate HTTP/2 requests directly SHOULD use the > :authority pseudo-header field instead of the Host header field. > > An intermediary that forwards a request to HTTP/2 MUST construct > an :authority pseudo-header field using the authority information > from the control data in the original request. If control data > does not contain authority, an intermediary MUST NOT add an > :authority pseudo-header field. Note that while the Host header > field can determine a request target, it is not control data > for this purpose; see Section 7.2 of [HTTP]. > > Martin, I find that last sentence confusing, as is the requirement > here on "intermediary" (which would include both proxy and gateway). > Why would a gateway be prevented from changing :authority? Or are you > making a distinction here between "forwarding" a request unchanged > versus satisfying a request by accessing an internal resource? I participated to this one, the cases where the intermediary changed the contents was not thought about there, the focus was on where to pick the original information. But this is another example of how bad it is to discuss about protocol translations, which is the job of the component itself, we ought to be very careful to only explain where to find the required information in the protocol and where to place it. The choice of transforming that information should be left to the implementation, it's not our business. That's why I really think that we need to solve this ambiguity around the notion of "authority". In HTTP/1.1 Host definitely conveys such an authority. Once we can make it clear that there is AT MOST one authority in a request (possibly from multiple sources which must match), then it's each protocol's business to indicate where to emit the authority, regardless of where it was found or how it was transformed. > The value of the Host header field MUST be ignored if control > data contains authority (that is, the :authority pseudo-header > field is present). > > When an HTTP/1.1 request is received in absolute form (i.e., > with the equivalent of :authority supplied), we require above > that Host be ignored and replaced with the absolute authority. > The last paragraph above requires to ignore (without saying who > is being required), but it doesn't require that Host be replaced > when forwarded by a proxy. This is the exact same problem as above, we should not discuss about transforming but about receiving and sending only. > I would have written that as > > A recipient MUST ignore the Host header field in a request > that contains an :authority pseudo-header field. If an > intermediary forwards such a request via HTTP/1.1 without > changing the request target, the intermediary MUST send > the :authority pseudo-header field value as the Host field > in the forwarded request (replacing any existing Host field) > to avoid potential vulnerabilities in HTTP routing. > > Is that something we should add now? I'd rather proceed differently then, we explain how to extract that authority info for the protocol itself without precluding anything about other protocols, then we provide non-normative examples of what an intermediary would do, e.g. when forwarding such a request to H2, to an H1 server or to an H1 proxy. This will illustrate the purpose of the rules, will attract attention on certain delicate corner cases but will be more useful, because it's impossible to enumerate all combinations of other protocols in the spec itself. Regards, Willy
Received on Friday, 3 September 2021 05:55:46 UTC