W3C home > Mailing lists > Public > ietf-http-wg@w3.org > July to September 2021

Re: UTF-8 fields (was Re: More on allowed field characters)

From: Willy Tarreau <w@1wt.eu>
Date: Mon, 30 Aug 2021 05:50:48 +0200
To: Martin Thomson <mt@lowentropy.net>
Cc: ietf-http-wg@w3.org, Poul-Henning Kamp <phk@phk.freebsd.dk>
Message-ID: <20210830035048.GB18357@1wt.eu>
On Mon, Aug 30, 2021 at 01:23:15PM +1000, Martin Thomson wrote:
> On Fri, Aug 27, 2021, at 16:49, Poul-Henning Kamp wrote:
> > In UTF-8 it becomes (0xe2, 0x98, 0xb9) which HPACK expands to 65 bits.
> > 
> > In comparison "\u2639" only takes 48 bits.
> 
> Huffman coding is optional, so it can stay at 48.
> 
> The good news here is that there might be a point in our future where
> interpreting fields as UTF-8 is interoperable.  The charset debate has ended
> for sure, we just have to wait for the remnants of the other charsets to
> clear themselves out.  Maybe there will be enough progress by 2028 that we'll
> be able to do something else.

I personally hope this will never happen for field names. UNICODE was
made for humans and we're discussing protocols to let computers interact.
Placing emojis there is useless. However we know that there is a very high
risk of aliasing between different values, that *will* cause a lot of
security trouble and interoperability issues.

Just my two cents,
Willy
Received on Monday, 30 August 2021 03:51:09 UTC

This archive was generated by hypermail 2.4.0 : Monday, 30 August 2021 03:51:16 UTC