Re: H2 draft: could we further refine Host vs :authority ?

Hi Willy,

This was a topic that was discussed extensively at the time we developed HTTP/2.  What we have in the specification is the result of that discussion.  My understanding is that some intermediaries did - at the time - pass authority form HTTP/1.1 requests that included a Host header field.  I would have to speculate about the real reasons for this, because most of the things I can imagine would naturally result in some very interesting security problems.

I've always been a little uncomfortable with that part of the specification, and we've also seen considerable improvements in the core semantics.  For instance, the -semantics draft is much clearer as far as this goes.  As far as I'm concerned, it would be better to pick either of the options you describe.  This is something that a server can (and probably should) do anyway.

If we change anything, the option where Host is dropped is probably better for compatibility.  In particular, this would be tolerant of old h2 implementations that deliberately add both as RFC 7540 requires.

Note that this change would be a change to the scope of the work that we undertook.  The editors have limited the changes in this area to clarifications rather than fixes.

Cheers,
Martin

p.s., For those following along, Willy opened https://github.com/httpwg/http2-spec/issues/905 for this.

On Wed, Aug 18, 2021, at 16:43, Willy Tarreau wrote:
> Hi,
> 
> I really love the stricter wording of the new H2 draft making it
> clear that :authority prevails over Host, especially after having
> being caught dealing with mismatches in haproxy despite extreme
> care on this area (while still complying with RFC7540), and we
> could enforce the rules according to the new draft.
> 
> However I still find the paragraph on this (8.3.1) a bit complex
> to deal with, and leaves some parts that don't sound totally
> consistent, possibly leaves some holes opened, or casts doubts
> about possible interpretations.
> 
> Indeed, we have this:
> 
>   1) An intermediary that translates a request to HTTP/2 from another HTTP
>      version MUST retain any Host header field, even if an authority is
>      part of control data.
> 
>   2) The value of the Host header field MUST be ignored if control data
>      contains authority (that is, the :authority pseudo-header field is
>      present).
> 
> So we're saying that an intermediary must produce a Host header field
> that is supposed to be ignored by the consumer. This doesn't sound very
> logical from a developer's perspective and I'm sure this will often be
> skipped. There probably is some interoperability reason behind this
> (e.g. trying to be robust against some H2 implementations) but then I
> think we should mention why it has to be done this way so that this is
> not skipped.
> 
> There's also some redundancy here which looks a bit confusing:
> 
>   An intermediary that translates a request to HTTP/2 from another
>   HTTP version MUST translate any authority information from the
>   request into an :authority pseudo-header field.
> 
> and:
> 
>   If the control data in the original request contains authority
>   information, an intermediary MUST include a :authority pseudo-header
>   field.
> 
> In addition in this paragraph at several places it's mentioned "from
> another HTTP version", but as soon as you have to deal with multiple
> versions on each side of an intermediary, you don't deal with versions,
> in fact you're using an "HTTP" internal representation which relies on
> semantics and in this case it becomes strange to make an exception for
> the case where the other side is using exactly the same version. Given
> that we're now having a version-agnostic spec for the semantics, I
> would suggest that we avoid speaking about versions in the H2 spec and
> instead strictly rely on semantics. This is even more important when
> the text mentions what to do to convert towards other versions, as this
> job usually is in fact to be done on the other side (from the semantic
> layer to the other version), and is highly likely to be missed.
> 
> For example, it's mentioned:
> 
>   For reference, an HTTP/1.1 Section 3.2 of request target [HTTP11] in
>   authority-form always includes authority, a request target in absolute-form
>   includes authority if the target URI includes authority, and request
>   targets in origin- or asterisk-form do not include authority.
> 
> Just having this starts to preclude rules on how to parse an HTTP/1
> request that dangerously overlap with [messaging].
> 
> I could suggest to simplify this part like this (don't take it word
> for word, I'm trying to illustrate):
> 
>   An intermediary that forwards a request to HTTP/2 MUST translate
>   any authority information from the request into an :authority
>   pseudo-header field. If the original request does not contain
>   authority information, the intermediary MUST NOT add an :authority
>   pseudo-header field. Please note that the presence of a Host header
>   field does not necessarily imply presence of an authority; refer
>   to [semantics] for details.
> 
> Finally I'm bothered by this point:
> 
>   An intermediary that translates a request to HTTP/2 from another HTTP version
>   MUST retain any Host header field, even if an authority is part of control
>   data.
> 
> This still does not forbid Host header field(s) differing from :authority,
> so sending this request to an HTTP/1->HTTP/2 intermediary:
> 
>     GET http://example1.org/ HTTP/1.1
>     Host: example2.org
> 
> Could easily result in the intermediary to only consider "example1.org"
> as the site name from the authority and ignore the Host header field,
> then pass the two components to the next hop over H2, which, if
> implemented according to 7540, would use the Host header field that
> is present, and would see "example2.org".
> 
> We really have a problem here in using either one field or the other,
> and never insisting on them to be equal.
> 
> And similarly to the above, what if the "another HTTP version" is already
> H2 and had its Host dropped on input ? There is a temptation here to
> only "ignore" Host but pass it along, which is exactly the problem
> we've been facing.
> 
> Couldn't we arrange all this either like this:
> 
>     - H2 servers and intermediaries must always drop any Host field
>       from a request if :authority is present and use :authority instead
> 
>     - H2 servers and intermediaries must always reject requests containing
>       multiple Host header fields if :authority is missing
> 
>     - H2 intermediaries must always emit a :authority in outgoing
>       requests if an authority was present in the initial request, and
>       drop Host, otherwise use the Host header field
> 
> Or this:
> 
>     - H2 servers and intermediaries MUST reject a request having both
>       Host and intermediary which do not match as malformed
> 
> (though that one could be in semantics, but given that we're suggesting
> a number of hints it would have its place there). The latter would be
> easier, more secure, and make more sense IMHO, because if we find a
> single valid use case for mismatching Host and authority, most of what
> is written about them in the specs flies into pieces :-/
> 
> For example we could imagine putting stricter wording in [semantics]
> regarding the necessity for Host and authority to match when both
> present, for the preference of authority, and deferring to each protocol
> spec the implementation details. It would possibly give something like
> this:
> 
>   HTTP/1.0 did not initially use Host information and would only convey
>   absolute-URIs when talking to proxies [RFC1945]. HTTP/1.1 standardized
>   over the use of Host to indicate the requested Host name, with a
>   requirement that servers accept absolute-URI. HTTP/2 generalized the
>   use of an absolute-URI with Host being optional. To prevent any risk
>   of host name mismatch between intermediaries, any server or intermediary
>   receiving a request carrying both an authority and a Host header field
>   must verify that they match (cf RFC3986#6.2.3) or reject the request as
>   invalid.
> 
> Then we can go on simply indicating that authority is preferred etc.
> 
> I'd really like that we can strengthen all this, because in haproxy
> over the years we've literally spent weeks only on this, trying hard
> to meticulously carry all that information from end to end, and tag
> the presence or absence of an authority to enforce a lot of specific
> rules that relate to versions combinations, and seeing that this is
> still not sufficient worries me quite a bit. For me this is an
> indication that these rules remain too difficult to enforce and probably
> insufficient at the same time.
> 
> Thanks,
> Willy
> 
> 

Received on Monday, 23 August 2021 04:24:37 UTC