- From: Stefan Eissing <stefan.eissing@greenbytes.de>
- Date: Fri, 27 Aug 2021 09:33:42 +0200
- To: Poul-Henning Kamp <phk@phk.freebsd.dk>
- Cc: Fielding Roy <fielding@gbiv.com>, Martin Thomson <mt@lowentropy.net>, ietf-http-wg@w3.org
> Am 27.08.2021 um 08:49 schrieb Poul-Henning Kamp <phk@phk.freebsd.dk>: > > -------- > Roy T. Fielding writes: > >> I am fine with HPACK also being used to convey UTF-8 named fields and/or >> carrying binary field values, but only when that is clearly indicated >> via the protocol and processed as such. > > I looked into this as part of Structured Headers, and I can say > rather conclusively that nobody competent would do that. > > The average symbol length in HPACK's huffman table is 18.2 bits, so > high entropy binary data, be it due to compression or encryption, > encodes to more than twice the original size, +128% to be precise. > > The HPACK huffman table could have been designed to minimize UTF-8's > penalty, at no cost to the lower 128 ASCII characters, but it almost > looks like the opposite was attempted. > > It is impossible to put a representative number on the UTF-8 > pessimization, but given the magnitude of it, I think the original > "frownie", U+2639, is a proper example: > > In UTF-8 it becomes (0xe2, 0x98, 0xb9) which HPACK expands to 65 bits. > > In comparison "\u2639" only takes 48 bits. > > According to my experiments, base64 is the optimal HPACK encoding > for high entropy binary data, obviously reflecting its popularity > in the random sample of HTTP headers that went into the HPACK table > design. > > The base-64 characters average 6.46 bits per symbol making the > overhead just: > > 4 * 6.46 / 3 = 8.62 bits/byte = 7.8% > Nicely analysed. > Poul-Henning > > -- > Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 > phk@FreeBSD.ORG | TCP/IP since RFC 956 > FreeBSD committer | BSD since 4.3-tahoe > Never attribute to malice what can adequately be explained by incompetence. >
Received on Friday, 27 August 2021 07:34:04 UTC