Re: Working Group Last Call: HTTP/2 revision

Hi Roy,

On Thu, Sep 02, 2021 at 10:47:58AM -0700, Roy T. Fielding wrote:
> On Sep 1, 2021, at 6:05 AM, Willy Tarreau <w@1wt.eu> wrote:
> > 
> > On Wed, Sep 01, 2021 at 08:35:49AM +0200, Julian Reschke wrote:
> >> To call "Host"
> >> mandatoy is just confusing. Maybe the whole statement should be
> >> simplified somewhat.
> > 
> > It's possible, nothing comes to my mind right now. The reality is
> > that Host has become a second place for authority long ago but is
> > weaker than authority... In an ideal world we'd say that an authority
> > is mandatory and must be provided as :authority or Host (or both),
> > preferably :authority. But with the split inherited from H1 and the
> > need to pass H1 semantics to H2 it's not that easy.
> 
> Well, it could have been that easy, but that's water under the bridge.
> 
> I think the rationale was to be able to send a request across versions
> 
>      h1   h2   h1
>    A -> B -> C -> D
> 
> and have both h1 requests look the same by preserving Host instead
> of :authority. Is that right? Maybe the spec should explain it.

I'm fine with clarifying something for this. We already support various
combinations of host/authority in haproxy in order to preserve as much
as possible the origin vs absolute form for H1, with the exception that
H2->H1 always switches to origin form due to the ambiguity in H2 that
doesn't provide the distinction between the two forms. But it could be
stated that:
  - origin forms have either authority or host
  - absolute forms have both

And then this removes one ambiguity. However I'm pretty sure I've already
seen quite a bunch of H2 requests coming with both, probably because that
was easy to do for the client and that no distinction was ever proposed
between them.

> Personally, I would have preferred that Host be deleted in passing
> and :authority be used consistently to generate Host only when
> forwarding to HTTP/1.1.

That's what we currently do, indeed, since we do not have a reliable
signal to say "this H2 request was in absolute form, please do that
for H1 as well".

> HTTP/1.1 (sec 3.2.2) requires that a message to a proxy be in absolute
> form:

Yep.

> and that a received Host be replaced with the :authority
> 
>    A client MUST send a Host header field in an HTTP/1.1 request even if
>    the request-target is in the absolute-form, since this allows the
>    Host information to be forwarded through ancient HTTP/1.0 proxies
>    that might not have implemented Host.

I'm seeing a concerning change here from 7230 that I did not notice
previously, the requirement that the Host value MUST be identical to
the authority:

   If the target URI includes an authority component, then a
   client MUST send a field-value for Host that is identical to that
   authority component.

This MUST combined with the last paragraph:

   A server MUST respond with a 400 (Bad Request) status code to any
   HTTP/1.1 request message that lacks a Host header field and to any
   request message that contains more than one Host header field or a
   Host header field with an invalid field-value.

allowed to reject requests with mismatching Host vs authority, which
is the basis of the security issues. And now I'm not seeing it that
clear anymore.

>    When a proxy receives a request with an absolute-form of request-
>    target, the proxy MUST ignore the received Host header field (if any)
>    and instead replace it with the host information of the request-
>    target.
>    A proxy that forwards such a request MUST generate a new
>    Host field value based on the received request-target rather than
>    forward the received Host field value.
> 
>    When an origin server receives a request with an absolute-form of
>    request-target, the origin server MUST ignore the received Host
>    header field (if any) and instead use the host information of the
>    request-target.  Note that if the request-target does not have an
>    authority component, an empty Host header field will be sent in this
>    case.
> 
> Note that it does not place those requirements on intermediaries,
> in general, because gateways/CDNs are allowed/expected to modify
> the request target when it is being forwarded. 

Sure but a number of intermediaries can be arbitrarily placed in
front of either proxies or servers. And a number of proxies nowadays
also support working as reverse-proxies since they can find all the
info they need in the request.

I suspect that by virtue of staying compatible with HTTP/1.0 we've
enumerated more real cases than really exist from the combinations
of authority vs host. Let me try to enumerate the type of requests
that come to my mind (and let's see which ones are forgotten, I'm
purposely leaving CONNECT aside):

  - HTTP/1.0 requests in origin form without Host. These ones are
    for a server and they provide no authority either. The server
    or intermediary that receives them may only rely on the target
    IP:port to figure what they were for.

  - HTTP/1.0 requests in absolute form without Host. These ones
    are explicitly for a proxy and do provide an authority. Any
    server or intermediary can figure the authority from the URI,
    so an intermediary may transform them to HTTP/1.1 on the other
    side and provide an matching Host header field if needed (e.g.
    a proxy may coalesce them on a reused idle HTTP/1.1 connection
    to the same host).

  - HTTP/1.0 requests in origin form with Host. A Host information
    was provided and the recipient is expected to use that as the
    only source of authority if needed in order to determine a
    virtual host. These ones are equivalent to HTTP/1.1 ones it seems.

  - HTTP/1.0 requests in absolute form with Host. These ones are
    equivalent to HTTP/1.1 ones it seems, with the possible exception
    that since Host was not mandatory in 1.0, Host is not guaranted
    to convery valid information, thus it makes sense to prefer the
    authority part from the URI and ignore Host.

  - HTTP/1.1 requests in origin form with empty Host. Host is present
    but has no value, similar to HTTP/1.0 above. The recipient will
    use connection-level info if desired.

  - HTTP/1.1 requests in origin form with non-empty Host. The recipient
    must use the Host field value as the sole source of authority info.

  - HTTP/1.1 requests in absolute form with empty Host. These are
    similar to HTTP/1.0 absolute requests. The authority is extracted
    from the URI.

  - HTTP/1.1 requests in absolute form with non-empty Host. This one
    was produced by an HTTP/1.1 agent which complies with 7230, so
    Host MUST match the authority in the URI. Any of them may be used
    as the source of authority, and either of them may be dropped along
    the path without losing information.

  - HTTP/2 requests with no :authority and no Host. These are not valid.

  - HTTP/2 requests with no :authority and empty Host. These are similar
    to HTTP/1.0 absolute above.

  - HTTP/2 requests with no :authority and non-empty Host. These are
    similar to HTTP/1.1 origin + Host above. Authority comes from Host.

  - HTTP/2 requests with :authority and no Host. They're similar to
    HTTP/1.0 in absolute form (i.e. authority comes from :authority)

  - HTTP/2 requests with :authority and empty Host. Same as the if
    ther is no Host.

  - HTTP/2 requests with :authority and non-empty Host. These are
    similar to HTTP/1.1 absolute + Host above. The sender had to respect
    the "MUST be identical" rule, so either can be used and the other one
    dropped. Note that they're pretty common in field from what I've seen
    (no numbers), which is a difference compared to HTTP/1.1.

As such it seems to me that these are the situations we are trying
to cover:
  - HTTP/1.0 in origin form and no (or empty) Host, HTTP/1.1 in
    origin form and empty Host, HTTP/2 with no authority and empty
    Host
    => authority unknown, decide based on the connection-level info

  - HTTP/{1.0,1.1} in origin form and valid Host, HTTP/2 with no
    :authority but valid Host
    => authority comes from Host

  - HTTP/1.0 in absolute form with or without Host
    => authority comes from URI (host ignored)

  - HTTP/1.1 in absolute form with empty or non-empty Host
    => authority comes from URI (host ignored)

  - HTTP/2 with both :authority and Host
    => authority comes from :authority (Host ignored)

And in any case, with HTTP/1.1 and above, a non-empty Host must
always match any provided authority.

Once we have this rules in place, an intermediary can safely convert
any version into any other one and emit the outgoing request according
to the target protocol's preferred conventions. It's up to the
intermediary to decide whether to emit an HTTP/1.x request as an
origin request or an absolute one depending on configuration or
hints (in haproxy we try to mimmick what we've observed on the other
side, but I would also find it normal to have it by configuration
since the notion of proxy on the next hop is very connection-centric).

> http2bis 8.3.1 says:
> 
>    Clients that generate HTTP/2 requests directly SHOULD use the
>    :authority pseudo-header field instead of the Host header field.
> 
>    An intermediary that forwards a request to HTTP/2 MUST construct
>    an :authority pseudo-header field using the authority information
>    from the control data in the original request. If control data
>    does not contain authority, an intermediary MUST NOT add an
>    :authority pseudo-header field. Note that while the Host header
>    field can determine a request target, it is not control data
>    for this purpose; see Section 7.2 of [HTTP].
> 
> Martin, I find that last sentence confusing, as is the requirement
> here on "intermediary" (which would include both proxy and gateway).
> Why would a gateway be prevented from changing :authority? Or are you
> making a distinction here between "forwarding" a request unchanged
> versus satisfying a request by accessing an internal resource?

I participated to this one, the cases where the intermediary changed
the contents was not thought about there, the focus was on where to
pick the original information. But this is another example of how bad
it is to discuss about protocol translations, which is the job of the
component itself, we ought to be very careful to only explain where
to find the required information in the protocol and where to place
it. The choice of transforming that information should be left to the
implementation, it's not our business.

That's why I really think that we need to solve this ambiguity around
the notion of "authority". In HTTP/1.1 Host definitely conveys such an
authority. Once we can make it clear that there is AT MOST one authority
in a request (possibly from multiple sources which must match), then
it's each protocol's business to indicate where to emit the authority,
regardless of where it was found or how it was transformed.

>    The value of the Host header field MUST be ignored if control
>    data contains authority (that is, the :authority pseudo-header
>    field is present).
> 
> When an HTTP/1.1 request is received in absolute form (i.e.,
> with the equivalent of :authority supplied), we require above
> that Host be ignored and replaced with the absolute authority.
> The last paragraph above requires to ignore (without saying who
> is being required), but it doesn't require that Host be replaced
> when forwarded by a proxy.

This is the exact same problem as above, we should not discuss about
transforming but about receiving and sending only.

> I would have written that as
> 
>    A recipient MUST ignore the Host header field in a request
>    that contains an :authority pseudo-header field. If an
>    intermediary forwards such a request via HTTP/1.1 without
>    changing the request target, the intermediary MUST send
>    the :authority pseudo-header field value as the Host field
>    in the forwarded request (replacing any existing Host field)
>    to avoid potential vulnerabilities in HTTP routing.
> 
> Is that something we should add now?

I'd rather proceed differently then, we explain how to extract
that authority info for the protocol itself without precluding
anything about other protocols, then we provide non-normative
examples of what an intermediary would do, e.g. when forwarding
such a request to H2, to an H1 server or to an H1 proxy. This
will illustrate the purpose of the rules, will attract attention
on certain delicate corner cases but will be more useful, because
it's impossible to enumerate all combinations of other protocols
in the spec itself.

Regards,
Willy

Received on Friday, 3 September 2021 05:55:46 UTC