- From: Kenneth Whistler <kenw@sybase.com>
- Date: Fri, 12 Apr 2002 09:56:41 -0700 (PDT)
- To: tony@att.com
- Cc: ietf-charsets@iana.org, kenw@sybase.com
Tony Hansen wrote: > One of the advantages about UTF8 that I've repeatedly heard touted was > that it was NOT restricted to 10FFFF, and indeed could handle the entire > 32-bit codespace when such codes were eventually allocated. This was > often used as an argument against other encodings, such as UTF16, that > didn't have the same property. And as you can see by my just cited quotation from 10646 itself, such argumentation was always a kind of shell game by detractors of UTF-16 and Unicode. The people making such arguments were not plugged in to the process in ISO and were apparently unaware that WG2 itself was keenly aware of the interoperability problems and eager to ensure that all UTF's for 10646 were *equally* applicable to all characters encoded in the standard. And the repeated concerns about the "eventual allocation" of characters in the 32-bit codespace that UTF-16 could not handle have reached the status of urban legends -- endlessly repeated among those in the Linux community who use repetition to define accuracy, without bothering to check with the source. These urban legends are grounded neither in the standard, nor in fact, nor in need, nor even in the capabilities of the standards committees. At current rates it will literally take *centuries* for the character encoding committees to fill up U+0000..U+EFFFD. Furthermore, *all* known candidates for character encoding, generously calculated -- and we have been scouring obscure sources now for well over a decade, including many, many minority and historic scripts I guarantee you will never have heard of -- will amount to less than 25% of the available codespace. --Ken P.S. Please feel free to forward this on to those who have been repeating the urban legend. ;-) > > Tony Hansen > tony@att.com
Received on Friday, 12 April 2002 12:57:20 UTC