- From: Willy Tarreau <w@1wt.eu>
- Date: Wed, 18 Aug 2021 08:43:44 +0200
- To: HTTP Working Group <ietf-http-wg@w3.org>
Hi, I really love the stricter wording of the new H2 draft making it clear that :authority prevails over Host, especially after having being caught dealing with mismatches in haproxy despite extreme care on this area (while still complying with RFC7540), and we could enforce the rules according to the new draft. However I still find the paragraph on this (8.3.1) a bit complex to deal with, and leaves some parts that don't sound totally consistent, possibly leaves some holes opened, or casts doubts about possible interpretations. Indeed, we have this: 1) An intermediary that translates a request to HTTP/2 from another HTTP version MUST retain any Host header field, even if an authority is part of control data. 2) The value of the Host header field MUST be ignored if control data contains authority (that is, the :authority pseudo-header field is present). So we're saying that an intermediary must produce a Host header field that is supposed to be ignored by the consumer. This doesn't sound very logical from a developer's perspective and I'm sure this will often be skipped. There probably is some interoperability reason behind this (e.g. trying to be robust against some H2 implementations) but then I think we should mention why it has to be done this way so that this is not skipped. There's also some redundancy here which looks a bit confusing: An intermediary that translates a request to HTTP/2 from another HTTP version MUST translate any authority information from the request into an :authority pseudo-header field. and: If the control data in the original request contains authority information, an intermediary MUST include a :authority pseudo-header field. In addition in this paragraph at several places it's mentioned "from another HTTP version", but as soon as you have to deal with multiple versions on each side of an intermediary, you don't deal with versions, in fact you're using an "HTTP" internal representation which relies on semantics and in this case it becomes strange to make an exception for the case where the other side is using exactly the same version. Given that we're now having a version-agnostic spec for the semantics, I would suggest that we avoid speaking about versions in the H2 spec and instead strictly rely on semantics. This is even more important when the text mentions what to do to convert towards other versions, as this job usually is in fact to be done on the other side (from the semantic layer to the other version), and is highly likely to be missed. For example, it's mentioned: For reference, an HTTP/1.1 Section 3.2 of request target [HTTP11] in authority-form always includes authority, a request target in absolute-form includes authority if the target URI includes authority, and request targets in origin- or asterisk-form do not include authority. Just having this starts to preclude rules on how to parse an HTTP/1 request that dangerously overlap with [messaging]. I could suggest to simplify this part like this (don't take it word for word, I'm trying to illustrate): An intermediary that forwards a request to HTTP/2 MUST translate any authority information from the request into an :authority pseudo-header field. If the original request does not contain authority information, the intermediary MUST NOT add an :authority pseudo-header field. Please note that the presence of a Host header field does not necessarily imply presence of an authority; refer to [semantics] for details. Finally I'm bothered by this point: An intermediary that translates a request to HTTP/2 from another HTTP version MUST retain any Host header field, even if an authority is part of control data. This still does not forbid Host header field(s) differing from :authority, so sending this request to an HTTP/1->HTTP/2 intermediary: GET http://example1.org/ HTTP/1.1 Host: example2.org Could easily result in the intermediary to only consider "example1.org" as the site name from the authority and ignore the Host header field, then pass the two components to the next hop over H2, which, if implemented according to 7540, would use the Host header field that is present, and would see "example2.org". We really have a problem here in using either one field or the other, and never insisting on them to be equal. And similarly to the above, what if the "another HTTP version" is already H2 and had its Host dropped on input ? There is a temptation here to only "ignore" Host but pass it along, which is exactly the problem we've been facing. Couldn't we arrange all this either like this: - H2 servers and intermediaries must always drop any Host field from a request if :authority is present and use :authority instead - H2 servers and intermediaries must always reject requests containing multiple Host header fields if :authority is missing - H2 intermediaries must always emit a :authority in outgoing requests if an authority was present in the initial request, and drop Host, otherwise use the Host header field Or this: - H2 servers and intermediaries MUST reject a request having both Host and intermediary which do not match as malformed (though that one could be in semantics, but given that we're suggesting a number of hints it would have its place there). The latter would be easier, more secure, and make more sense IMHO, because if we find a single valid use case for mismatching Host and authority, most of what is written about them in the specs flies into pieces :-/ For example we could imagine putting stricter wording in [semantics] regarding the necessity for Host and authority to match when both present, for the preference of authority, and deferring to each protocol spec the implementation details. It would possibly give something like this: HTTP/1.0 did not initially use Host information and would only convey absolute-URIs when talking to proxies [RFC1945]. HTTP/1.1 standardized over the use of Host to indicate the requested Host name, with a requirement that servers accept absolute-URI. HTTP/2 generalized the use of an absolute-URI with Host being optional. To prevent any risk of host name mismatch between intermediaries, any server or intermediary receiving a request carrying both an authority and a Host header field must verify that they match (cf RFC3986#6.2.3) or reject the request as invalid. Then we can go on simply indicating that authority is preferred etc. I'd really like that we can strengthen all this, because in haproxy over the years we've literally spent weeks only on this, trying hard to meticulously carry all that information from end to end, and tag the presence or absence of an authority to enforce a lot of specific rules that relate to versions combinations, and seeing that this is still not sufficient worries me quite a bit. For me this is an indication that these rules remain too difficult to enforce and probably insufficient at the same time. Thanks, Willy
Received on Wednesday, 18 August 2021 06:43:58 UTC