- From: Carl W. Brown <cbrown@xnetinc.com>
- Date: Thu, 27 Sep 2001 17:49:39 -0700
- To: <www-international@w3.org>
Paul, > -----Original Message----- > From: www-international-request@w3.org > [mailto:www-international-request@w3.org]On Behalf Of Paul Deuter > Sent: Thursday, September 27, 2001 1:52 PM > To: Richard, Francois M; www-international@w3.org > Subject: RE: utf-8 Locale support on Solaris and Linux > > > UTF-8 is not a locale. UTF-8 is a multi-byte encoding of the > Unicode repetoire of characters. > > The behavior of the standard C functions depends on the > compiler and the system that you are using. > > In order to get standard cross-platform support for > Unicode strings, I recommend using the ICU library. > > http://www-124.ibm.com/icu/ You are right to recommend ICU. There are differences in how each Unix system deals with Unicode. On Linux for example I can convert the UTF-8 text to Unicode wide characters with a mbstowcs. On Solaris the wide character implementation is not Unicode. ICU provides a consistent cross platform implementation. However you either have to convert to UTF-16 or add UTF-8 support to ICU. xIUA http://www.xnetinc.com/xiua/ is open source code that adds full UTF-8 support to ICU so that everything from xiua_strcoll to xiua_strtok works with UTF-8 strings. If you don't want the rest of the code you can just use the UTF-8 support code. > > -Paul > > Paul Deuter > Internationalization Manager > Plumtree Software > paul.deuter@plumtree.com > > > > -----Original Message----- > From: Richard, Francois M [mailto:Francois.M.Richard@usa.xerox.com] > Sent: Thursday, September 27, 2001 1:28 PM > To: 'www-international@w3.org' > Subject: utf-8 Locale support on Solaris and Linux > > > A basic question I guess... > > Do C functions like strlen(), isaplha() and other locale sensitive C > functions behave properly when Locale has been set to utf-8? The standard Unix setlocale is not thread safe. Using ICU there are no such restrictions. If you also use xIUA with ICU then you can use the setlocle style of programming but be thread safe. You can use POSIX locales with xiua_OpenLocale. For example: xiua_OpenLocale("pt_BR.utf-8",XDFCODEPAGE); /* UTF-8 data with an underlying UTF-8 code page*/ xiua_OpenLocale("pt_BR.iso-8859-1",XDFUTF8; /* UTF-8 data with an underlying iso-8859-1 code page*/ For web applications xIUA also has some special functions. For example if you want to determine what character set to use for a browser it provides a routine to analyze the Accept-Charset string and find the best character set for the specific locale. Carl
Received on Thursday, 27 September 2001 20:49:45 UTC