- From: Merle Tenney <Merle.Tenney@corp.palm.com>
- Date: Fri, 28 Sep 2001 13:21:07 -0700
- To: "'Carl W. Brown'" <cbrown@xnetinc.com>, www-international@w3.org
Carl, > > 3. Most ICU interfaces do not take UTF-8 strings but rather UTF-16 > > strings (which are 16-bits wide). > > ICU has some macros for UTF-8 support but you have to look at them > carefully. They were added to ICU because they do not add to > the code size. > They are not a complete UTF-8 support package. There are > some that I use > but others can get you into trouble. We just had a > discussion on the use of > such macros to count the number of characters in a string. > There are two > classes of support macros SAFE and UNSAFE. The SAFE validate > the data and > the UNSAFE which run faster do not. Using either macro in a > routine will > produce a bad count if the data is bad. The count may differ > depending on > the choice but neither will give you any indication that the > count is wrong. > > In my humble opinion, you are better off implementing your own > routines for > many of these functions. They can be faster and be more > reliable. This is > the one area they I feel that ICU would have been better off > in just not > trying to do a half done job. In all other areas the ICU > code is top notch. This is good advice, but it leads naturally to another question: Why doesn't ICU have a branch that provides equivalent support to the existing code, but for text encoded in UTF-8? I know that you can convert easily between UTF-8 and UTF-16, but you really want to have a system that is designed, optimized, and tested for your native encoding. There are a *lot* of Unicode implementations that will be based on UTF-8, so I don't think this is an unusual request. Has this been considered before? Would it take a lot of work to complement the existing ICU libraries with native UTF-8 versions and maintain them in parallel? Merle
Received on Friday, 28 September 2001 16:22:06 UTC