Re: Jesse's feedback on HTTP/2 header compression from Tatsuhiro Tsujikawa on 2013-08-30 (ietf-http-wg@w3.org from July to September 2013)

From: Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com>
Date: Sat, 31 Aug 2013 00:00:15 +0900
To: Jesse Wilson <jesse@swank.ca>
Cc: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <CAPyZ6=K8wjubULP4d1ALcYcm6PHANY2hBxBBxMfrwTx=ZwAyRw@mail.gmail.com>
On Thu, Aug 29, 2013 at 5:44 PM, Jesse Wilson <jesse@swank.ca> wrote:

> I'm implementing HTTP/2 and would like to point out some rough edges! I
> don‘t have many specific recommendations; that can come later. Overall I’m
> very happy with the direction of HPACK and HTTP/2.0.
>
>
> State management is hard
>
> The spec is ambiguous in a few places. My first attempt at a reference set
> was a bitset that tracked which entry indices were set. This doesn‘t work
> well. I actually would need to track two bits of state per entry: whether
> it has been emitted for the current header set, and also whether it is in
> the reference set for the next round. Ugly. The set doesn’t offer much
> guidance here.
>
> How do reference set elements get evicted? The spec says nothing about how
> pruning impacts the reference set.
>
I was under the same impression. So asked the author and the answer was
when entry is removed from the header table, it is automatically removed
from the reference set.
This issue is tracked at: https://github.com/http2/http2-spec/issues/238

>  And the examples don‘t help out. They don’t even demonstrate pruning!
> They also omit tracking which headers in the reference set have been
> emitted (so they won't be emitted again).
>
> I fear that pruning is going to be a major source of bugs in the wild.
> Especially since it‘ll usually work. My client and your server will agree
> on the contents of the header table until eventually I evict something that
> you don’t, or vice versa.
>
> For example, I‘m building in Java and there’s an impedance mismatch
> between Java‘s stupid UTF-16 strings and the spec’s UTF-8 strings. I need
> to track the byte lengths of everything so that I can evict correctly. If I
> accidentally use the number of UTF-16 chars I‘ll be consistent with the
> UTF-8 byte length until I use a multibyte character. When that happens, my
> internal buffer size could get out of sync with my peer’s buffer size.
>
When comparing name and headers, it must be done by "direct bitwise comparisons
on names and values". And the length is the number of bytes name/value
contains, not the number of unicode characters. In other words, name/value
are actually byte arrays. Java String object does not work well in this
case.

> This is mostly Java‘s fault, but UTF-16 chars are not unique to Java.
> What’s the maximum string length in a HTTP/2.0 header? Spec omits that. It
> can‘t be longer than a frame length? Or can header frame boundaries
> interrupt a header frame’s contents? That would be tricky to implement!
>
In the current draft-06 spec, the framing layer knows nothing about the
contents of the encoded header block.
The sender can split encoded header block in any point and in any number of
parts and send them in the order. The receiver then concatenates the header
blocks in the order it received and performs decoding. Well, nothing
prevents you from decoding in the streaming way, of course.

Length-prefixed
>
> I see a potential problem in the way header lengths are encoded. Suppose
> I'm reading a 100 byte HEADERS block and it includes a string of length
> 200. One of these lengths is incorrect, and my program needs to detect this
> and issue a protocol error. But detecting it is awkward because the
> 100-byte frame size is not local information for the header reader. Even
> worse, the same kind of problem can happen when reading variable-length
> integers. I could overflow my 100 bytes while reading byte five of a 5-byte
> number.
>
The decoder is responsible to check the  header is properly encoded. You
received 200 bytes header block and it says it contains 500byte value, then
the header block is surely broken and must be rejected. If the decoder
requires the frame length to decode the header block properly, then it must
be passed to the decoder.

The variable-length integer overflow is interesting. Because the current
spec introduced the CONTINUATION frame, there is no limit for name/value.
In practice, I think it will be capped to 4K or 8K.

> Chromium & Firefox
>
> To confirm my HPACK parsing is working I‘m connecting to OkHttp’s
> MockWebServer with Chromium and Firefox nightlies. The latest public
> Chromium implements weak compression: it doesn't compress. Instead it just
> uses literal header without indexing. Not to pick on the Chromium team, but
> this is a bad sign for the simplicity of HPACK! The Firefox code does use
> compression, though I couldn't interoperate. (Is their initial headers
> table different from mine?).
>
To interoperate, both peers must implement the same header compression
specification.
If chromim and firefox implements header-compression-draft-01 (which was
used in the 1st interop test) and your server implements draft-03, then it
won't work because they are incompatible. Ask the chromium and firefox
developers which header compression they have implemented.


> Bugs & Attack Vectors
>
> At Square, our apps and browsers make HTTP requests against our load
> balancers. Those load balancers do some basic request routing and then
> forward requests to the appropriate app server.
>
> An attacker could exploit a buggy load balancer by causing it to
> misinterpret request headers. For example, it could use UTF-16 characters
> to corrupt the headers table, causing it to interpret a dangerous header
> like :path /_internal_status as a benign header like user-agent:
> /_internal_status.
>
The current spec says the value is UTF-8. The decoder will check that the
value is UTF-8 and if not, then emit connection error.
Even if decoder stores such data, they are just treated as byte arrays for
comparison, not UTF-16 or whatever.

 Priority in headers is awkward
>
> My first attempt at header compression broke because Firefox was sending a
> priority in the HTTP/2 HEADERS frame, and I wasn‘t prepared to deal with
> it. I fixed it by reading that priority, but then suffered another dumb bug
> because I failed to decrement the frame’s length by 4 bytes.
>
> This is annoying. The only potential danger here is that it‘s another
> opportunity for a frame to be internally inconsistent. What do you do when
> you encounter a 0-length header that has its PRIORITY bit set? Correct code
> should fail with a protocol exception, but it’s not too hard to imagine
> incorrect code proceeding here.
>
>
> Are you suggesting that HEADERS should always have 4 bytes priority field?
But it does not solve the problem because you have to check the length of
the frame to see that it contains mandatory fields in anyway.

Best regards,

Tatsuhiro Tsujikawa


> Again; these are nitpicks! Thanks for the fine spec guys.
>
Received on Friday, 30 August 2013 15:01:03 UTC