Utf-8 support in C functions on Linux from Richard, Francois M on 2001-12-13 (www-international@w3.org from October to December 2001)

From: Richard, Francois M <Francois.M.Richard@usa.xerox.com>
Date: Thu, 13 Dec 2001 09:48:29 -0500
To: www-international@w3.org, linux-utf8@nl.linux.org
Message-id: <B08661D21F0FD311A21A00805FC7D65001EA3621@usa0845ms1.svcdoc.mc.xerox.com>

I have been posting quite regularly questions about utf-8 support/Locale on
Linux (and Solaris). Before asking one more, I would like to thank people
who contributed to theses discussions. Your feedback and replies have been
always very interesting and most of the time very valuable ;)... Thanks for
taking some of your time to answer.

We were doing some testing with a piece of C code on Linux (Locale sensitive
thanks to SetLocale() and with a system Locale set first to en_US.utf8 and
then to sv_SV.utf8)  and it looks like magically strcoll() was sorting the
utf-8 file read in input(two characters only: ä and z). So in en_US.utf8, ä
came first, then z. And in sv_SV.uft8, z came first, then ä.

Does it mean strcoll() properly handle utf-8 data??? I would be very
surprised. But how to explain the proper sorting results we got?

Is there somewhere an extensive list indicating which C char functions do
handle utf-8 properly and which ones do not (and as a result need to be
replaced with wide C functions to correctly manipulate utf-8 data)? That
would save us a lot of time since interpreting our test results is not in
fact that obvious.

François

Received on Thursday, 13 December 2001 09:53:02 UTC