- From: Martin Duerst <duerst@w3.org>
- Date: Fri, 23 Aug 2002 11:31:16 +0900
- To: Markus Scherer <markus.scherer@jtcsv.com>, charsets <ietf-charsets@iana.org>
Hello Markus, As Harald said that he finds the additional information useful, I won't object to it as such anymore. However, I think people looking at the registration will still tend to see a lot of marketing-like information, and will easily oversee the main message, which comes at the very end and is somewhat cryptic: Intended usage: LIMITED USE To get the thing into a better balance, I request that at the very start of "Additional information:", some clear warning is added, such as: BOCU-1 is intended for limited use in special situations. The preferred and most widely supported encoding for Unicode/ISO 10646 on the Internet is UTF-8. Regards, Martin. At 17:29 02/08/22 -0700, Markus Scherer wrote: >I would like to refresh the BOCU-1 registration text to the current state >of the discussion. > >Changes compared to the original proposal from July 10: >- changed the confusing paragraph about CCS & CES as agreed >- changed the specification URL to UTN #6 > >I hope that the consolidated text helps with the registration process. >Please let me know if there is anything else that I can do. > >Sincerely, >markus >---- 8< ---- >Charset name: BOCU-1 > >Charset aliases: (none, except for the implicit csBOCU-1) > >Suitability for use in MIME text: Yes > >Published specifications: > Specification of BOCU-1 with sample code for conversion to/from Unicode: > http://www.unicode.org/notes/tn6/ > > Description of the general "BOCU" algorithm, > with a link to the BOCU-1 specification: > >http://oss.software.ibm.com/icu/docs/papers/binary_ordered_compression_for_ >unicode.html > > A converter implementation that is conformant to this specification is > available in ICU (http://oss.software.ibm.com/icu/), an open-source > library. > The BOCU-1 converter C source code is in icu/source/common/ucnvbocu.c: > >http://oss.software.ibm.com/cvs/icu/~checkout~/icu/source/common/ucnvbocu.c > > CCS & CES: The BOCU-1 charset is a combination of the > Unicode/ISO 10646 Coded Character Set (CCS) with > the Character Encoding Scheme (CES) specified in > the above document. It covers exactly the > UTF-16-reachable subset of ISO 10646. > >ISO 10646 equivalency table: > Algorithmic, see published specification and sample code. > >Additional information: > BOCU-1 is an encoding (CES/TES) of Unicode/ISO 10646 > for the storage and exchange of text data. > It is stateful and provides a good byte/code point ratio while > being directly usable in SMTP emails, database fields and other contexts. > > BOCU-1 combines the wide applicability of UTF-8 with the compactness > of SCSU. > It is useful for short strings and maintains code point order. > > BOCU-1 does not encode most ASCII characters with US-ASCII byte values. > > There is a Unicode signature byte sequence defined > (FB EE 28, see specification). > > BOCU-1 is suitable for > - databases: maintains Unicode code point order > - emails: directly suitable for MIME text > - CVS and similar: deterministic and resets at CR and LF > > BOCU-1 is not suitable for > - efficient internal processing (convert to UTF-8/16/32) > - contexts where encoding declarations _in_ documents _must_ be > ASCII-readable > >Person & email address to contact for further information: > Markus W. Scherer > IBM Globalization Center of Competency > 5600 Cottle Road > Mail Stop: 50-2/B11 > San Jose, CA 95193 > USA > > markus.scherer@jtcsv.com > markus.scherer@us.ibm.com > >Intended usage: LIMITED USE > >---- >Suggested MIBenum value: 1020 > (first available in Unicode/ISO 10646 range; like SCSU [which is 1011]) > > >Thank you for your consideration, > >markus >
Received on Thursday, 22 August 2002 22:32:22 UTC