W3C home > Mailing lists > Public > www-international@w3.org > July to September 2001

RE: utf-8 Locale support on Solaris and Linux

From: Carl W. Brown <cbrown@xnetinc.com>
Date: Thu, 27 Sep 2001 17:49:39 -0700
To: <www-international@w3.org>
Message-ID: <FNEHIHOMIIDPDCIFEJEGCEMHCJAA.cbrown@xnetinc.com>
Paul,

> -----Original Message-----
> From: www-international-request@w3.org
> [mailto:www-international-request@w3.org]On Behalf Of Paul Deuter
> Sent: Thursday, September 27, 2001 1:52 PM
> To: Richard, Francois M; www-international@w3.org
> Subject: RE: utf-8 Locale support on Solaris and Linux
>
>
> UTF-8 is not a locale.  UTF-8 is a multi-byte encoding of the
> Unicode repetoire of characters.
>
> The behavior of the standard C functions depends on the
> compiler and the system that you are using.
>
> In order to get standard cross-platform support for
> Unicode strings, I recommend using the ICU library.
>
> http://www-124.ibm.com/icu/

You are right to recommend ICU.  There are differences in how each Unix
system deals with Unicode.  On Linux for example I can convert the UTF-8
text to Unicode wide characters with a mbstowcs.  On Solaris the wide
character implementation is not Unicode.

ICU provides a consistent cross platform implementation.  However you either
have to convert to UTF-16 or add UTF-8 support to ICU.  xIUA
http://www.xnetinc.com/xiua/ is open source code that adds full UTF-8
support to ICU so that everything from xiua_strcoll to xiua_strtok works
with UTF-8 strings.  If you don't want the rest of the code you can just use
the UTF-8 support code.

>
> -Paul
>
> Paul Deuter
> Internationalization Manager
> Plumtree Software
> paul.deuter@plumtree.com
>
>
>
> -----Original Message-----
> From: Richard, Francois M [mailto:Francois.M.Richard@usa.xerox.com]
> Sent: Thursday, September 27, 2001 1:28 PM
> To: 'www-international@w3.org'
> Subject: utf-8 Locale support on Solaris and Linux
>
>
> A basic question I guess...
>
> Do C functions like strlen(), isaplha() and other locale sensitive C
> functions behave properly when Locale has been set to utf-8?

The standard Unix setlocale is not thread safe.  Using ICU there are no such
restrictions.  If you also use xIUA with ICU then you can use the setlocle
style of programming but be thread safe.  You can use POSIX locales with
xiua_OpenLocale.  For example:

xiua_OpenLocale("pt_BR.utf-8",XDFCODEPAGE); /* UTF-8 data with an underlying
UTF-8 code page*/

xiua_OpenLocale("pt_BR.iso-8859-1",XDFUTF8;  /* UTF-8 data with an
underlying iso-8859-1 code page*/

For web applications xIUA also has some special functions.  For example if
you want to determine what character set to use for a browser it provides a
routine to analyze the Accept-Charset string and find the best character set
for the specific locale.

Carl
Received on Thursday, 27 September 2001 20:49:45 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:57 GMT