H2 draft: could we further refine Host vs :authority ?

Hi,

I really love the stricter wording of the new H2 draft making it
clear that :authority prevails over Host, especially after having
being caught dealing with mismatches in haproxy despite extreme
care on this area (while still complying with RFC7540), and we
could enforce the rules according to the new draft.

However I still find the paragraph on this (8.3.1) a bit complex
to deal with, and leaves some parts that don't sound totally
consistent, possibly leaves some holes opened, or casts doubts
about possible interpretations.

Indeed, we have this:

  1) An intermediary that translates a request to HTTP/2 from another HTTP
     version MUST retain any Host header field, even if an authority is
     part of control data.

  2) The value of the Host header field MUST be ignored if control data
     contains authority (that is, the :authority pseudo-header field is
     present).

So we're saying that an intermediary must produce a Host header field
that is supposed to be ignored by the consumer. This doesn't sound very
logical from a developer's perspective and I'm sure this will often be
skipped. There probably is some interoperability reason behind this
(e.g. trying to be robust against some H2 implementations) but then I
think we should mention why it has to be done this way so that this is
not skipped.

There's also some redundancy here which looks a bit confusing:

  An intermediary that translates a request to HTTP/2 from another
  HTTP version MUST translate any authority information from the
  request into an :authority pseudo-header field.

and:

  If the control data in the original request contains authority
  information, an intermediary MUST include a :authority pseudo-header
  field.

In addition in this paragraph at several places it's mentioned "from
another HTTP version", but as soon as you have to deal with multiple
versions on each side of an intermediary, you don't deal with versions,
in fact you're using an "HTTP" internal representation which relies on
semantics and in this case it becomes strange to make an exception for
the case where the other side is using exactly the same version. Given
that we're now having a version-agnostic spec for the semantics, I
would suggest that we avoid speaking about versions in the H2 spec and
instead strictly rely on semantics. This is even more important when
the text mentions what to do to convert towards other versions, as this
job usually is in fact to be done on the other side (from the semantic
layer to the other version), and is highly likely to be missed.

For example, it's mentioned:

  For reference, an HTTP/1.1 Section 3.2 of request target [HTTP11] in
  authority-form always includes authority, a request target in absolute-form
  includes authority if the target URI includes authority, and request
  targets in origin- or asterisk-form do not include authority.

Just having this starts to preclude rules on how to parse an HTTP/1
request that dangerously overlap with [messaging].

I could suggest to simplify this part like this (don't take it word
for word, I'm trying to illustrate):

  An intermediary that forwards a request to HTTP/2 MUST translate
  any authority information from the request into an :authority
  pseudo-header field. If the original request does not contain
  authority information, the intermediary MUST NOT add an :authority
  pseudo-header field. Please note that the presence of a Host header
  field does not necessarily imply presence of an authority; refer
  to [semantics] for details.

Finally I'm bothered by this point:

  An intermediary that translates a request to HTTP/2 from another HTTP version
  MUST retain any Host header field, even if an authority is part of control
  data.

This still does not forbid Host header field(s) differing from :authority,
so sending this request to an HTTP/1->HTTP/2 intermediary:

    GET http://example1.org/ HTTP/1.1
    Host: example2.org

Could easily result in the intermediary to only consider "example1.org"
as the site name from the authority and ignore the Host header field,
then pass the two components to the next hop over H2, which, if
implemented according to 7540, would use the Host header field that
is present, and would see "example2.org".

We really have a problem here in using either one field or the other,
and never insisting on them to be equal.

And similarly to the above, what if the "another HTTP version" is already
H2 and had its Host dropped on input ? There is a temptation here to
only "ignore" Host but pass it along, which is exactly the problem
we've been facing.

Couldn't we arrange all this either like this:

    - H2 servers and intermediaries must always drop any Host field
      from a request if :authority is present and use :authority instead

    - H2 servers and intermediaries must always reject requests containing
      multiple Host header fields if :authority is missing

    - H2 intermediaries must always emit a :authority in outgoing
      requests if an authority was present in the initial request, and
      drop Host, otherwise use the Host header field

Or this:

    - H2 servers and intermediaries MUST reject a request having both
      Host and intermediary which do not match as malformed

(though that one could be in semantics, but given that we're suggesting
a number of hints it would have its place there). The latter would be
easier, more secure, and make more sense IMHO, because if we find a
single valid use case for mismatching Host and authority, most of what
is written about them in the specs flies into pieces :-/

For example we could imagine putting stricter wording in [semantics]
regarding the necessity for Host and authority to match when both
present, for the preference of authority, and deferring to each protocol
spec the implementation details. It would possibly give something like
this:

  HTTP/1.0 did not initially use Host information and would only convey
  absolute-URIs when talking to proxies [RFC1945]. HTTP/1.1 standardized
  over the use of Host to indicate the requested Host name, with a
  requirement that servers accept absolute-URI. HTTP/2 generalized the
  use of an absolute-URI with Host being optional. To prevent any risk
  of host name mismatch between intermediaries, any server or intermediary
  receiving a request carrying both an authority and a Host header field
  must verify that they match (cf RFC3986#6.2.3) or reject the request as
  invalid.

Then we can go on simply indicating that authority is preferred etc.

I'd really like that we can strengthen all this, because in haproxy
over the years we've literally spent weeks only on this, trying hard
to meticulously carry all that information from end to end, and tag
the presence or absence of an authority to enforce a lot of specific
rules that relate to versions combinations, and seeing that this is
still not sufficient worries me quite a bit. For me this is an
indication that these rules remain too difficult to enforce and probably
insufficient at the same time.

Thanks,
Willy

Received on Wednesday, 18 August 2021 06:43:58 UTC