- From: Thierry Sourbier <webmaster@i18ngurus.com>
- Date: Mon, 20 Aug 2001 09:24:07 +0200
- To: <www-international@w3.org>
Sourav, > C can also handle Unicode by using UTF-8 as the > multi-byte encoding in the char * type. That is somewhat true, UTF-8 present some interesting characteristics for C programmers: * It preserves the ASCII characters (all characters <128 remain as-is in UTF-8). * UTF-8 encoded strings do not contain NULL bytes. Therefore if your programs relies on recognizing some ASCII sequences AND does not modify characters that have a code above 128 (i.e. is 8-bit clean) then your program may work with UTF-8 just fine. Of course, you'll need to understand that you can no longer: 1. Use any *unsafe* functions such as tolower() or toupper(), that may corrupt characters above 128. 2. Rely on the fact that 1 byte = 1 char for random character access (e.g. myString[5]) or string memory allocation as a single character can occupy multiple bytes. 3. Do string sorting as it will provide funky results if strings contains non-ASCII characters. 4. Rely on string compare as it may be unreliable due to the various Unicode normalization forms. Note that this is only a quick rundown on potential issues. So yes, C can handle UTF-8 just fine, but there is a high potential for doing *wrong* things (you may argue that this is a feature of C, but I won't go there... :).The difficulty of adding UTF-8 support will depend on what you are doing with all your char*. If you do a lot of string manipulation, it may be a good time for you to either use the Unicode Windows API's as you are on NT, use the free ICU or any other commercially available Unicode toolset library. Some good source of information for you may be: http://www.unicode.org for all the information on Unicode http://oss.software.ibm.com/icu/ for information on ICU (for a C wrapper look at http://www.xnetinc.com/xiua/). I would also recommend reading "Adding internationalization support to the base standard for JavaScript" by Richard Gillam which is a good case study on adding Unicode support to legacy code. http://www-106.ibm.com/developerworks/library/internationalization-support.h tml Finally you can have a look on my site below to get plenty more links :). Cheers, Thierry Sourbier ----------------------------- www.i18ngurus.com - Open internationalization resources directory. ----- Original Message ----- From: "souravm" <souravm@infy.com> To: <www-international@w3.org> Sent: Monday, August 20, 2001 8:12 AM Subject: Unicode support for C/C++ > Hi All, > > I've a software written in C in Windows NT platform. I want to upgrade > it for Unicode support. I got this information from net that - C can > also handle Unicode by using UTF-8 as the multi-byte encoding in the > char * type. I want to know hoe excatly it can be implemented. > > Thanks in advance, > Sourav > >
Received on Monday, 20 August 2001 03:17:50 UTC