UTF-8 fields (was Re: More on allowed field characters)

On Fri, Aug 27, 2021, at 16:49, Poul-Henning Kamp wrote:
> In UTF-8 it becomes (0xe2, 0x98, 0xb9) which HPACK expands to 65 bits.
> 
> In comparison "\u2639" only takes 48 bits.

Huffman coding is optional, so it can stay at 48.

The good news here is that there might be a point in our future where interpreting fields as UTF-8 is interoperable.  The charset debate has ended for sure, we just have to wait for the remnants of the other charsets to clear themselves out.  Maybe there will be enough progress by 2028 that we'll be able to do something else.

I'm content to wait a while longer before exploring that space; though I appreciate the efforts of pioneers to brave the incompatibility hazards in search of whatever it might be you get here.

Received on Monday, 30 August 2021 03:23:52 UTC