W3C home > Mailing lists > Public > www-international@w3.org > July to September 2000

Re: Portability of Unicode code !?

From: <toby_phipps@peoplesoft.com>
Date: Fri, 21 Jul 2000 04:02:04 -0700
To: nitin_goel@yahoo.com
cc: www-international@w3.org
Message-ID: <OF7B7EAAA0.FA428C83-ON88256923.003BB908@peoplesoft.com>

Hi Nitin,

We do just this across Solaris, HP-UX, Sequent, Compaq Tru64 and AIX.  You
really can't depend on the portability of wchar_t types, and will need to
use your own type if you want a consistent character representation across
all platforms.  Problem here is that you'll then need your own
implementations of the standard C string functions and anything else that
you expect to accept Unicode data.  You'll also need stubs for any system
calls that expect character arguments, which either map your real Unicode
type to the OS's wchar_t implementation or converts them back to a the OS's
non-Unicode char type before calling the real function.

We solved the problem by licensing a portable Unicode library (Rosette from
Basis Technology), and writing a "compatibility" library of 100 or so
common string and system functions implemented via Rosette instead of via
the standard C runtime library.  We also needed our own 16-bit type we
defined as a unsigned short, and called WCHAR.  One of the nice features of
Rosette is that it came with a large set of pre-writted C runtime library
string functions implemented with their code we could use as a base.

One other problem is string constants in your code.  If you don't use the
operating system's wchar_t implementation your L"string" quoted literals
won't match your Unicode type.  We ended up writing a preprocessor that
expanded out L"string" into Unicode characters represented as \x<<byte1>>
<<byte2>>.  This was much more difficult than originally expected given the
myriad of ways quoted strings can be used (at variable initialization,
assignments etc.), but it was possible.  This pre-processor writes out .i
files which were then passed to the real C++ preprocessor/compiler.

Good luck - it's a big job.  Things would be much easier for cross-platform
Unicode implementations if the C standards defined a common wchar_t type.


Toby Phipps
PeopleSoft, Inc.
tphipps@peoplesoft.com  +1-925-694-9525

                    "Nitin Goel"                                                                                            
                    <nitin_goel@yahoo.com>        To:     www-international@w3.org                                          
                    Sent by:                      cc:                                                                       
                    www-international-requ        Subject:     Portability of Unicode code !?                               
                    07/20/2000 06:43 PM                                                                                     

Hi everybody,

This is a desperate appeal for all you souls to help
me out with a problem I face. I have a unicode server
which handles database files. Now I assumed that
unicode is 16bit data (is that too bad an assumption
Anyway, it so happens that while On NT and AIX wchar_t
does translate to a 16 bit value, things are very much
different on SunOs and HPUX !! There wchar_t is
as long and int respectively ! And now I am stuck with

a lot of code and database files. Has anybody faced
problem before ? Is there any input someone can give
regarding porting unicode enabled code ?

Can I work around this by geting a third party unicode
library from somewhere and linking my code to it
than the system libraries on these platforms ?

Any help/input on this would be extremely valuable.

Thank you,
PS> Please mail me the responses as I am not
subscribed to any of these mailing lists.

Do You Yahoo!?
Get Yahoo! Mail Free email you can access from anywhere!
Received on Friday, 21 July 2000 07:02:21 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:20 UTC