Re: More on allowed field characters

> Am 27.08.2021 um 08:49 schrieb Poul-Henning Kamp <phk@phk.freebsd.dk>:
> 
> --------
> Roy T. Fielding writes:
> 
>> I am fine with HPACK also being used to convey UTF-8 named fields and/or
>> carrying binary field values, but only when that is clearly indicated
>> via the protocol and processed as such.
> 
> I looked into this as part of Structured Headers, and I can say
> rather conclusively that nobody competent would do that.
> 
> The average symbol length in HPACK's huffman table is 18.2 bits, so 
> high entropy binary data, be it due to compression or encryption,
> encodes to more than twice the original size, +128% to be precise.
> 
> The HPACK huffman table could have been designed to minimize UTF-8's
> penalty, at no cost to the lower 128 ASCII characters, but it almost
> looks like the opposite was attempted.
> 
> It is impossible to put a representative number on the UTF-8
> pessimization, but given the magnitude of it, I think the original
> "frownie", U+2639, is a proper example:
> 
> In UTF-8 it becomes (0xe2, 0x98, 0xb9) which HPACK expands to 65 bits.
> 
> In comparison "\u2639" only takes 48 bits.
> 
> According to my experiments, base64 is the optimal HPACK encoding
> for high entropy binary data, obviously reflecting its popularity
> in the random sample of HTTP headers that went into the HPACK table
> design.
> 
> The base-64 characters average 6.46 bits per symbol making the
> overhead just:
> 
> 	4 * 6.46 / 3 = 8.62 bits/byte = 7.8% 
> 

Nicely analysed. 

> Poul-Henning
> 
> -- 
> Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
> phk@FreeBSD.ORG         | TCP/IP since RFC 956
> FreeBSD committer       | BSD since 4.3-tahoe    
> Never attribute to malice what can adequately be explained by incompetence.
> 

Received on Friday, 27 August 2021 07:34:04 UTC