Re: Publication has been requested for draft-ietf-httpbis-http2bis-05 from Willy Tarreau on 2021-09-28 (ietf-http-wg@w3.org from July to September 2021)

From: Willy Tarreau <w@1wt.eu>
Date: Tue, 28 Sep 2021 15:47:08 +0200
To: Cory Benfield <cory@lukasa.co.uk>
Cc: Mark Nottingham <mnot@mnot.net>, HTTP Working Group <ietf-http-wg@w3.org>, Martin Thomson <mt@lowentropy.net>
Message-ID: <20210928134708.GC28759@1wt.eu>
On Tue, Sep 28, 2021 at 02:23:25PM +0100, Cory Benfield wrote:
> > > For example, method is referenced in Section 9 and given the ABNF
> > > `token`, which is a stricter constraint than you ask for here. Section
> > > 7.2 covers :authority and gives it the ABNF uri-host [ ":" port ],
> > > which again is a stricter constraint. Finally, the http2bis definition
> > > of :path header calls out that its value is absolute-path, optionally
> > > followed by ? and query, unless it is *.
> >
> > I know, but there is one particular case that significantly increases
> > this risk, which is when you're adding H2 to code already supporting H1
> > where the controls are performed later by code that you already trust for
> > doing the right thing with elements extracted from H1. Then you assemble
> > everything and parse the result via your well-trusted request-line parser.
> > But it's too late, the space in :method, the "://" in :scheme, the "/" or
> > space in :authority, or the space in :path have already defined different
> > delimiters.
> 
> I'm nervous about the idea of adding normative requirements that are
> there to defend against an implementation choice. In general if you
> transform a HTTP/2 message into a HTTP/1.1 message before you parse it
> you need to be careful, because it is extremely hard to identically
> reproduce a HTTP/2 message in a HTTP/1.1 wire format.

I know but to be honest it's not always plain H2 to H1. Sometimes it's
just that you recompose the protocol elements that you already know how
to parse (i.e. the URI) and it seems perfectly legit at first glance to
say "I'm having a URI parser that I'm using for H1, it's rock-solid,
I'll naturally use it for the H2 URI". Given that the URI is split into
blocks you do not necessarily see the problem when you reassemble them.
We did the mistake exactly on this one :-)

> > I mean, it's really easy to get trapped, and the long list of examples
> > below tends to confirm it:
> >
> >    https://portswigger.net/research/http2
> >
> > I'd have liked to at least add that to the security recommendations.
> 
> I think my security recommendation would be much stronger: do not
> naively attempt to serialise a HTTP/1.1 message from a HTTP/2 message.

I agree with this, which is also why I objected to seeing HTTP/1.1
appearing in normative text. But quite frankly, a URI is not HTTP/1.1
it's HTTP. Many of us have to recompose an RFC3986-compliant URI from
a request, and it's easy to overlook this aspect that is specific to
H2 (and H3). In addition, some implementations will not even have the
required code to check for abuse in these sub-components, because they're
normally only extracted from the single string by stopping on the
delimitors, which guarantees they are correct. Thus the check on each
individual one can possible not even exist.

> There exists no simple string-formatting transformation of a valid
> HTTP/2 message into a valid HTTP/1.1 message. To perform this
> transformation it is necessary to parse the HTTP/2 message into a
> structured, semantic form, and then re-serialize that message into
> HTTP/1.1.

I totally agree and that's what we're doing. The semantics support
a "URI" :-/

> I'm not strongly opposed to doing this because, well, we did provide
> guidance for other fields.

That was my feeling as well, to remind implementers not to take the
task lightly.

> But this does seem like it's a good
> indication of how difficult this kind of guidance is: we end up having
> to provide endless detailed exceptions that ultimately boil down to
> "you cannot printf HTTP/2 to HTTP/1".

I agree with that and that's not what I'm seeking. When you see a URI
cut into pieces in a diagram where the scheme goes at one place, the
authority at another and the path in a third one, with only valid
examples, it's really difficult to think differently than gluing these
valid elements together to perform the inverse operation.

We do have examples of how H2 requests could be decoded in an HTTP/1-like
representation for easier reading. Maybe we could simply have warnings
showing what could result from a lack of individual components checks.

Just my two cents,
Willy
Received on Tuesday, 28 September 2021 13:47:29 UTC