- From: Roy T. Fielding <fielding@gbiv.com>
- Date: Fri, 27 Aug 2021 09:13:02 -0700
- To: Poul-Henning Kamp <phk@phk.freebsd.dk>
- Cc: Martin Thomson <mt@lowentropy.net>, ietf-http-wg@w3.org
> On Aug 26, 2021, at 11:49 PM, Poul-Henning Kamp <phk@phk.freebsd.dk> wrote: > > -------- > Roy T. Fielding writes: > >> I am fine with HPACK also being used to convey UTF-8 named fields and/or >> carrying binary field values, but only when that is clearly indicated >> via the protocol and processed as such. > > I looked into this as part of Structured Headers, and I can say > rather conclusively that nobody competent would do that. That's funny. I just spent three years updating a 30 year old protocol and am quite sure that (aside from Referer and TE) field names are rarely chosen for efficiency. If-Moderated-Since is my worst. > The average symbol length in HPACK's huffman table is 18.2 bits, so > high entropy binary data, be it due to compression or encryption, > encodes to more than twice the original size, +128% to be precise. > > The HPACK huffman table could have been designed to minimize UTF-8's > penalty, at no cost to the lower 128 ASCII characters, but it almost > looks like the opposite was attempted. > > It is impossible to put a representative number on the UTF-8 > pessimization, but given the magnitude of it, I think the original > "frownie", U+2639, is a proper example: > > In UTF-8 it becomes (0xe2, 0x98, 0xb9) which HPACK expands to 65 bits. > > In comparison "\u2639" only takes 48 bits. > > According to my experiments, base64 is the optimal HPACK encoding > for high entropy binary data, obviously reflecting its popularity > in the random sample of HTTP headers that went into the HPACK table > design. > > The base-64 characters average 6.46 bits per symbol making the > overhead just: > > 4 * 6.46 / 3 = 8.62 bits/byte = 7.8% That's nice to know. Maybe we should add a "check your HPACK length" resource somewhere, or a new flag on curl for evaluating extension names. ....Roy
Received on Friday, 27 August 2021 16:13:23 UTC