Re: HTTP/2: allow binary data in header field values

Hi,

On Mon, Aug 28, 2017 at 06:34:31PM -0700, Piotr Sikora wrote:
> The proposal I have in mind is based on what gRPC is already doing [1], i.e.:
> 
> 1. Each peer announces that it accepts binary data via HTTP/2 SETTINGS option,

This sounds reasonable.

> 2. Binary header field values are prefixed with NUL byte (0x00), so
> that binary value 0xFF is encoded as a header field value 0x00 0xFF.
> This allows binary-aware peers to differentiate between binary headers
> and VCHAR headers. In theory, this should also protect peers unaware
> of this extension from ever accepting such headers, since RFC7540
> requires that requests/responses with headers containing NUL byte
> (0x00) MUST be treated as malformed and rejected, but I'm not sure if
> that's really enforced.

I've just checked my experimental implementation in haproxy and I'm
indeed missing this check. There are two reasons to this, which I think
could indicate others might have done the same mistake :
  - the header values are prefixed with their length so there's no need
    to parse their contents before copying ;

  - the reminder of the list of forbidden characters is only present in
    the security consideration section of 7540 and never mentionned in
    7541, while it's mostly when processing hpack that such characters
    have a chance to be detected.

So at least some check would be needed on other implementations (well,
I might very well have produced the only bogus one, but I think that
overlooking this is easy).

> 3. Binary-aware peers MUST base64 encode binary header field values
> when forwarding them to peers unaware of this extension and/or when
> converting to HTTP/1.1.

I really don't like this, because it becomes possible for a sender to
produce special values on the other side by benefitting from the
decoding, thus evading certain filtering measures. For example, an
agent may send :

   Content-Encoding: \x00\x83\x38\xa9

and the binary-aware recipient having to "encode" it as gzip would
apply base64 to "\x83\x38\xa9" and would produce :

   Content-Encoding: gzip

You can apply the same principle to other fields like content-length or
other and easily cause some trouble. If you need this, at the very least
it's important to enclose the encoded value between some "rare enough"
characters to prevent such risks. Something like this for example :

   Binary-Header: $b64<Y2h1bmtlZA==>$

The example above would produce  :

   Content-Encoding: $b64<gzip>$

That's just an example of course.

> 4. Binary header field values cannot be concatenated, because there is
> no delimiter that we can use.

Good point.

> NOTE: This proposal implies that endpoints SHOULD NOT use binary
> header field values before receiving HTTP/2 SETTINGS from the peer.
> However, since, at least in theory, all RFC7540-compliant peers
> unaware of this extension MUST reject requests with headers containing
> NUL byte (0x00) with a stream error, endpoints could opportunistically
> use binary header field values on the first flight and assume that if
> peer isn't aware of this extension, then it will reject the request,

This is really the part which needs to be verified I think.

I'm now thinking that prefixing the value with <CR><NUL> instead of <NUL>
could be more efficient : while <NUL> may translate to an end of string
on many implementations and simply result in a field silently being
processed as empty, a <CR> in the middle of a string is always an
error, unless the next character is <LF> in which case what follows is
the next header field. In case a gateway would inappropriately forward
the <CR> to HTTP/1, it would produce an invalid message, even if
truncated on the <NUL> since the line would end in <CR><CR><LF>.

> which can be subsequently retried with base64 encoded header field
> values.
> 
> I'd like to hear if anyone strongly disagrees with this proposal
> and/or the binary data in header field values in general.

I don't disagree and think that some fields could later become binary
(dates, etc). I'm just extra careful for knowing that most HTTP/1 parsers
are extremely lazy and that at least one H2 parser (the one I wrote) does
not notice the presence of <CR>, <LF> or <NUL> in values. But this one is
not yet deployed so it can be fixed. I don't know for other ones.

Cheers,
Willy

Received on Tuesday, 29 August 2017 04:10:49 UTC