W3C home > Mailing lists > Public > ietf-charsets@w3.org > July to September 2002

Re: Registration of new charset BOCU-1 refreshed

From: Martin Duerst <duerst@w3.org>
Date: Fri, 23 Aug 2002 11:31:16 +0900
To: Markus Scherer <markus.scherer@jtcsv.com>, charsets <ietf-charsets@iana.org>
Message-id: <>

Hello Markus,

As Harald said that he finds the additional information useful,
I won't object to it as such anymore.

However, I think people looking at the registration will still
tend to see a lot of marketing-like information, and will easily
oversee the main message, which comes at the very end and is
somewhat cryptic:  Intended usage: LIMITED USE

To get the thing into a better balance, I request that at
the very start of "Additional information:", some clear
warning is added, such as:

BOCU-1 is intended for limited use in special situations.
The preferred and most widely supported encoding for
Unicode/ISO 10646 on the Internet is UTF-8.

Regards,    Martin.

At 17:29 02/08/22 -0700, Markus Scherer wrote:
>I would like to refresh the BOCU-1 registration text to the current state 
>of the discussion.
>Changes compared to the original proposal from July 10:
>- changed the confusing paragraph about CCS & CES as agreed
>- changed the specification URL to UTN #6
>I hope that the consolidated text helps with the registration process.
>Please let me know if there is anything else that I can do.
>---- 8< ----
>Charset name: BOCU-1
>Charset aliases: (none, except for the implicit csBOCU-1)
>Suitability for use in MIME text: Yes
>Published specifications:
>     Specification of BOCU-1 with sample code for conversion to/from Unicode:
>     http://www.unicode.org/notes/tn6/
>     Description of the general "BOCU" algorithm,
>     with a link to the BOCU-1 specification:
>     A converter implementation that is conformant to this specification is
>     available in ICU (http://oss.software.ibm.com/icu/), an open-source 
> library.
>     The BOCU-1 converter C source code is in icu/source/common/ucnvbocu.c:
>     CCS & CES: The BOCU-1 charset is a combination of the
>     Unicode/ISO 10646 Coded Character Set (CCS) with
>     the Character Encoding Scheme (CES) specified in
>     the above document. It covers exactly the
>     UTF-16-reachable subset of ISO 10646.
>ISO 10646 equivalency table:
>     Algorithmic, see published specification and sample code.
>Additional information:
>     BOCU-1 is an encoding (CES/TES) of Unicode/ISO 10646
>     for the storage and exchange of text data.
>     It is stateful and provides a good byte/code point ratio while
>     being directly usable in SMTP emails, database fields and other contexts.
>     BOCU-1 combines the wide applicability of UTF-8 with the compactness 
> of SCSU.
>     It is useful for short strings and maintains code point order.
>     BOCU-1 does not encode most ASCII characters with US-ASCII byte values.
>     There is a Unicode signature byte sequence defined
>     (FB EE 28, see specification).
>     BOCU-1 is suitable for
>     - databases: maintains Unicode code point order
>     - emails: directly suitable for MIME text
>     - CVS and similar: deterministic and resets at CR and LF
>     BOCU-1 is not suitable for
>     - efficient internal processing (convert to UTF-8/16/32)
>     - contexts where encoding declarations _in_ documents _must_ be 
> ASCII-readable
>Person & email address to contact for further information:
>     Markus W. Scherer
>     IBM Globalization Center of Competency
>     5600 Cottle Road
>     Mail Stop: 50-2/B11
>     San Jose, CA 95193
>     USA
>     markus.scherer@jtcsv.com
>     markus.scherer@us.ibm.com
>Intended usage: LIMITED USE
>Suggested MIBenum value: 1020
>     (first available in Unicode/ISO 10646 range; like SCSU [which is 1011])
>Thank you for your consideration,
Received on Thursday, 22 August 2002 22:32:22 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:52:18 UTC