Re: Libraries assuming iso-8859-1 (was: Re: Consensus call to include Display Strings in draft-ietf-httpbis-sfbis)

--------
Martin J. Dürst writes:

Adding base64 encoding to the table:

>                               Legacy  UTF-8   proposed  expansion  base64  b64expansion
> ASCII                        1       1       1         1           1.33    1.33
> Latin+Accents, e.g. Polish   1       ~1.5    ~2        2           2       2
> Arabic/Cyrillic/...          1       2       6         6           2.66    2.66
> Indic scripts,...            1       3       9         9           4       4
> Chinese/Japanese/...         2       3       9         4.5         4       2
>
> So some text in an Indic or South Asian Script gets expanded by a factor 
> of 9 when compared to a legacy singlebyte encoding.

Base64 does not penalize non-western languages nearly as much.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

Received on Sunday, 28 May 2023 07:28:24 UTC