W3C home > Mailing lists > Public > ietf-charsets@w3.org > July to September 2002

Re: Registration of new charset BOCU-1 refreshed - UTF-8

From: Harald Alvestrand <harald@alvestrand.no>
Date: Tue, 03 Sep 2002 12:17:44 +0200
To: Markus Scherer <markus.scherer@jtcsv.com>, charsets <ietf-charsets@iana.org>
Message-id: <57010000.1031048264@askvoll.hjemme.alvestrand.no>

--On mandag, august 26, 2002 15:31:49 -0700 Markus Scherer 
<markus.scherer@jtcsv.com> wrote:

>> Having a proliferation of Unicode
>> encodings is about as problematic as having a proliferation of
>> legacy encodings.
> Respectfully, I would like to disagree on this point.
> The use of non-Unicode charsets opens a whole different, huge Pandora's
> box of problems, which are well described in Unicode TR 22 and the XML
> Japanese Profile.
> All Unicode charsets are easily decoded in relatively small and fast code
> (even SCSU and BOCU-1), without any confusion about what Unicode code
> point any byte sequence maps to. Mapping tables for non-Unicode charsets
> can be large - e.g., ICU's standard set uses about 5MB of data, while
> there is 0 for Unicode charsets.

Remember that we have zero (none, nada, nil, zilch) generally supported 
ways of figuring out what charsets the recipient of an email supports.

Thus, the first email client that is capable of supporting BOCU-1 will be 
capable of sending mail that no other email client in the world can display 
legibly, and *has no way of knowing when they become capable of doing so*.

And this is only one of many places where one uses charsets in protocols.
I think you should add Martin's warning to your registration - possibly 
reformulated as follows (line 2 added):

BOCU-1 is intended for limited use in special situations
where the use of this charset can be preconfigured or negotiated.
The preferred and most widely supported encoding for
Unicode/ISO 10646 on the Internet is UTF-8.


Received on Tuesday, 3 September 2002 06:18:26 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:52:18 UTC