- From: Harald Tveit Alvestrand <harald@alvestrand.no>
- Date: Thu, 19 Sep 2002 19:15:46 -0700
- To: iana@iana.org
- Cc: ietf-charsets@iana.org
IANA, please insert the enclosed registration into the charset registry. Harald Alvestrand ---------- Forwarded Message ---------- Date: 16. september 2002 09:31 -0700 From: Markus Scherer <markus.scherer@jtcsv.com> To: charsets <ietf-charsets@iana.org> Subject: Registration of new charset BOCU-1 refreshed 2 Harald Tveit Alvestrand wrote: > thanks - please send the revised BOCU-1 registration with the warning in > it, so that I can pass it to IANA and say "I approve". > > I think this has been discussed enough now. Here it is, the previous refresh from 2002-aug-22 with your (Harald's) reformulation of Martin's warning about UTF-8 being the preferred charset. Thank you very much, markus ---- 8< ---- Charset name: BOCU-1 Charset aliases: (none, except for the implicit csBOCU-1) Suitability for use in MIME text: Yes Published specifications: Specification of BOCU-1 with sample code for conversion to/from Unicode: http://www.unicode.org/notes/tn6/ Description of the general "BOCU" algorithm, with a link to the BOCU-1 specification: http://oss.software.ibm.com/icu/docs/papers/binary_ordered_compression_for_ unicode.html A converter implementation that is conformant to this specification is available in ICU (http://oss.software.ibm.com/icu/), an open-source library. The BOCU-1 converter C source code is in icu/source/common/ucnvbocu.c: http://oss.software.ibm.com/cvs/icu/~checkout~/icu/source/common/ucnvbocu.c CCS & CES: The BOCU-1 charset is a combination of the Unicode/ISO 10646 Coded Character Set (CCS) with the Character Encoding Scheme (CES) specified in the above document. It covers exactly the UTF-16-reachable subset of ISO 10646. ISO 10646 equivalency table: Algorithmic, see published specification and sample code. Additional information: BOCU-1 is an encoding (CES/TES) of Unicode/ISO 10646 for the storage and exchange of text data. It is stateful and provides a good byte/code point ratio while being directly usable in SMTP emails, database fields and other contexts. BOCU-1 combines the wide applicability of UTF-8 with the compactness of SCSU. It is useful for short strings and maintains code point order. BOCU-1 does not encode most ASCII characters with US-ASCII byte values. BOCU-1 is intended for limited use in special situations where the use of this charset can be preconfigured or negotiated. The preferred and most widely supported encoding for Unicode/ISO 10646 on the Internet is UTF-8. There is a Unicode signature byte sequence defined (FB EE 28, see specification). BOCU-1 is suitable for - databases: maintains Unicode code point order - emails: directly suitable for MIME text - CVS and similar: deterministic and resets at CR and LF BOCU-1 is not suitable for - efficient internal processing (convert to UTF-8/16/32) - contexts where encoding declarations _in_ documents _must_ be ASCII-readable Person & email address to contact for further information: Markus W. Scherer IBM Globalization Center of Competency 5600 Cottle Road Mail Stop: 50-2/B11 San Jose, CA 95193 USA markus.scherer@jtcsv.com markus.scherer@us.ibm.com Intended usage: LIMITED USE ---- Suggested MIBenum value: 1020 (first available in Unicode/ISO 10646 range; like SCSU [which is 1011]) Thank you for your consideration, markus ---------- End Forwarded Message ----------
Received on Friday, 20 September 2002 15:38:51 UTC