- From: Cory Benfield <cory@lukasa.co.uk>
- Date: Mon, 6 Sep 2021 09:30:47 +0100
- To: Willy Tarreau <w@1wt.eu>
- Cc: Martin Thomson <mt@lowentropy.net>, HTTP Working Group <ietf-http-wg@w3.org>, Poul-Henning Kamp <phk@phk.freebsd.dk>
On Mon, 30 Aug 2021 at 04:55, Willy Tarreau <w@1wt.eu> wrote: > > On Mon, Aug 30, 2021 at 01:23:15PM +1000, Martin Thomson wrote: > > On Fri, Aug 27, 2021, at 16:49, Poul-Henning Kamp wrote: > > > In UTF-8 it becomes (0xe2, 0x98, 0xb9) which HPACK expands to 65 bits. > > > > > > In comparison "\u2639" only takes 48 bits. > > > > Huffman coding is optional, so it can stay at 48. > > > > The good news here is that there might be a point in our future where > > interpreting fields as UTF-8 is interoperable. The charset debate has ended > > for sure, we just have to wait for the remnants of the other charsets to > > clear themselves out. Maybe there will be enough progress by 2028 that we'll > > be able to do something else. > > I personally hope this will never happen for field names. UNICODE was > made for humans and we're discussing protocols to let computers interact. > Placing emojis there is useless. However we know that there is a very high > risk of aliasing between different values, that *will* cause a lot of > security trouble and interoperability issues. This is a strong point worth emphasising. Many language-specific frameworks will decode HTTP headers into the language "string" type in order to facilitate ease of use. Some languages will potentially normalise that input, which opens a new exciting confused deputy vector. While I agree that most modern implementations will happily _tolerate_ a UTF-8 field name, we should steer well clear of ever defining field names that use non-ACSII characters for exactly this reason.
Received on Monday, 6 September 2021 08:31:15 UTC