Utf-8 support in C functions on Linux

I have been posting quite regularly questions about utf-8 support/Locale on
Linux (and Solaris). Before asking one more, I would like to thank people
who contributed to theses discussions. Your feedback and replies have been
always very interesting and most of the time very valuable ;)... Thanks for
taking some of your time to answer.

We were doing some testing with a piece of C code on Linux (Locale sensitive
thanks to SetLocale() and with a system Locale set first to en_US.utf8 and
then to sv_SV.utf8)  and it looks like magically strcoll() was sorting the
utf-8 file read in input(two characters only: ä and z). So in en_US.utf8, ä
came first, then z. And in sv_SV.uft8, z came first, then ä.

Does it mean strcoll() properly handle utf-8 data??? I would be very
surprised. But how to explain the proper sorting results we got?

Is there somewhere an extensive list indicating which C char functions do
handle utf-8 properly and which ones do not (and as a result need to be
replaced with wide C functions to correctly manipulate utf-8 data)? That
would save us a lot of time since interpreting our test results is not in
fact that obvious.

François

Received on Thursday, 13 December 2001 09:53:02 UTC