Re: Consensus call to include Display Strings in draft-ietf-httpbis-sfbis from Julian Reschke on 2023-05-28 (ietf-http-wg@w3.org from April to June 2023)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Sun, 28 May 2023 13:32:33 +0200
To: Martin J. Dürst <duerst@it.aoyama.ac.jp>, Willy Tarreau <w@1wt.eu>
Cc: ietf-http-wg@w3.org
Message-ID: <329d848c-8116-f601-e041-44918ce2348d@gmx.de>

On 28.05.2023 09:28, Martin J. Dürst wrote:
> Hello Willy, Julian, others,
>
> There was a time (way back) when only the basic multilingual plane (i.e.
> a 16-bit space) had characters assigned. That turned out to not be
> enough, but it had the desirable side effect of keeping things compact.
> In UTF-8, that space can be covered by 3 bytes max per character, and it
> may have been that there were some implementations limited to 3 bytes
> max because they thought there wouldn't be any characters in the rest of
> the codespace.
>
> UTF-8 itself was defined to use up to 6 bytes per character, because it
> was covering the full 32-bit space of the early ISO-10646 drafts. There
> were definitely implementations that covered all that space.
>
> After some years, it became clear that a 16-bit space was not enough,
> but a 32-bit space was way too much. ISO and Unicode agreed on 17 planes
> of 16 bits, leading to an overall code space from U+0000 to U+10FFFF. As
> a result, the definition of UTF-8 was restricted to 4 bytes max per
> character (see RFC 3629, e.g.
> https://datatracker.ietf.org/doc/html/rfc3629#section-4, or your
> favorite Unicode version, or ISO 10646).

Martin,

thanks for the wonderful explanation!

Best regards, Julian

Received on Sunday, 28 May 2023 11:33:08 UTC