Interoperability concern around :authority in RFC9113

Hello!

Over the last 3 months, we've received two different reports of H2
interoperability issues between haproxy and an origin server, the first
one being Jetty and the latter apparently being Apache Traffic Server.
(both reported in the same issue below:
https://github.com/haproxy/haproxy/issues/2592).

The concern was receiving H2 requests in origin form ("lack of an
:authority pseudo header field in requests") sent by haproxy to an
origin server, in a setup where haproxy is an edge gateway between
the internet and the origin server and receives HTTP/1 requests.
The scenario was the following one:

          HTTP/1.1            HTTP/2
  client ----------> haproxy ---------> origin

When we wrote our H2 implementation, we tried to follow the spec as
closely as possible. By then it was RFC7540 which explicitly stated
in section 8.1.2.3:

  | The ":authority" pseudo-header field includes the authority
  | portion of the target URI ([[RFC3986], Section 3.2]). The authority
  | MUST NOT include the deprecated "userinfo" subcomponent for "http"
  | or "https" schemed URIs.

  | To ensure that the HTTP/1.1 request line can be reproduced
  | accurately, this pseudo-header field MUST be omitted when
  | translating from an HTTP/1.1 request that has a request target in
  | origin or asterisk form (see [[RFC7230], Section 5.3]). Clients
  | that generate HTTP/2 requests directly SHOULD use the ":authority"
  | pseudo-header field instead of the Host header field. An
  | intermediary that converts an HTTP/2 request to HTTP/1.1 MUST
  | create a Host header field if one is not present in a request by
  | copying the value of the ":authority" pseudo-header field.

Thus it's clearly forbidden to forge a :authority from Host for
example, since pseudo headers are meant to reflect the components of the
request line (or the status line for responses). This would transform
requests in origin form to absolute form.

For me, the newer version of the specs (RFC 911x) still contains this,
but it might be less obvious since split over several documents, and I
suspect that it's possible that the lack of explicit "MUST be omitted"
statement like above might be a reason why we're only receiving such
reports now:

  - RFC9113 #8.3.1 says:

    | The ":authority" pseudo-header field conveys the authority
    | portion (Section 3.2 of [RFC3986]) of the target URI
    | (Section 7.1 of [HTTP]). The recipient of an HTTP/2 request
    | MUST NOT use the Host header field to determine the target URI
    | if ":authority" is present.

    At this point we start to see that :authority might be missing.
    
    | Clients that generate HTTP/2 requests directly MUST use the
    | ":authority" pseudo-header field to convey authority
    | information, unless there is no authority information to convey
    | (in which case it MUST NOT generate ":authority").

    Same here.

      (...)
    | An intermediary that forwards a request over HTTP/2 MUST
    | construct an ":authority" pseudo-header field using the
    | authority information from the control data of the original
    | request, unless the original request's target URI does not
    | contain authority information (in which case it MUST NOT
    | generate ":authority").

    Same here. But below:

    | Note that the Host header field is not the sole source of
    | this information; see Section 7.2 of [HTTP].

    This one seems to imply that Host might possibly be used to construct
    :authority, which if true, would contradict RFC7540 above.

  - RFC9110 #6.2 about control data clearly says:

    | In HTTP/1.1 ([HTTP/1.1]) and earlier, control data is sent as
    | the first line of a message. In HTTP/2 ([HTTP/2]) and HTTP/3
    | ([HTTP/3]), control data is sent as pseudo-header fields with a
    | reserved name prefix (e.g., ":authority").

    So here there's no ambiguity in my opinion.

For someone having known the rule stated in 7540, I think that what's
above remains pretty conform. But I think that the note about Host can
cause some confusion for those who don't notice RFC9110#6.2 as it would
imply that requests in origin form may be turned to absolute form.

For example, one of the participants to the discussion in the ATS issue
on this topic seems to think such a request is malformed, which to me,
seems to indicate that the new wording, even if more general and precise,
might be a bit more difficult to grasp:

  https://github.com/apache/trafficserver/issues/11765#issuecomment-2347015362

We've proposed workarounds for this consisting in rewriting the request
line from the Host part, but I wouldn't like to see this generalize.
It's ugly and error-prone. Similarly it seems that the projects have
also considered implementing an option to accept a request in origin
form but this seems a bit convoluted to me in that it requires more
efforts from their users, and clearly raises the question about the
relevance of origin vs absolute form on the internet nowadays.

I'm not sure what can/should be done at this point to limit the risk
that this issue becomes more common.

In our case, we've gone through great efforts trying to respect 911x
as closely as possibly, being able to respect both origin and absolute
forms of H1->H2->H1 transformations. This is important for our users
because many of them enforce routing or filtering rules applying to the
URI for example, hence often expect an origin form in HTTP/1 on some
legacy configs. But is there any relevance of origin form anymore beyond
HTTP/1, should we all enforce absolute form everywhere by default ?

Or if origin form remains necessary (I think so), should we try to
improve the wording of the spec to make it clear that it's still
permitted ? It's hard for me to propose anything since the wording
*is* correct, but probably not as intuitive as the previous one when
reading 9113 alone.

Maybe it could be sufficient to insert in 9113 a paragraph close to
the one from RFC7540 above ? It could look a bit like this:

   Requests sent in origin form lack :authority and use Host instead.
   Requests in absolute form use :authority and MAY also have an
   optional Host header field that MUST match :authority. Clients
   SHOULD prefer the absolute form. Intermediaries converting HTTP/1.1
   requests to HTTP/2 MUST apply the same form as they received.

I'm interested in opinions and suggestions on this topic.

Thanks!
Willy

Received on Monday, 16 September 2024 06:50:24 UTC