W3C home > Mailing lists > Public > www-international@w3.org > July to September 2000

Unicode Conversions

From: Stephen Toner <Stephen.Toner@virtualaccess.com>
Date: Thu, 7 Sep 2000 12:50:27 +0100
To: <www-international@w3.org>
Message-ID: <JEEKILJANELCODIAEJFNOEMBCAAA.Stephen.Toner@virtualaccess.com>
Hello all,
I have been trying to input unicode from a browser and store it in a
database.  The problem is the different encodings used to represent the
unicode.
The input text is in the UTF-8 format.  I have read on the Microsoft support
site that SQL Server 7.0 uses a different Unicode encoding (UCS-2) and does
not recognize UTF-8 as valid character data.  Of the solutions offered only
two were of any use:
1) Convert between the two on input and output
2) Store as raw data in binary form
I have been unable to get the raw data into the database correctly so
decided to try the first option.  However although I keep reading that round
conversion between the 2 formats is quick, easy and reliable, i have been
unable to accomplish this.  I am using JSPs, so the Session.Codepage command
doesn't work, and anyway I would prefer a less platform specific solution.
Does anyone know of a way of converting a java string in UTF-8 to UTF-16
format.
Also I was wondering if anyone knows why the UTF-8 can't be treated as a
regular Latin1 string.  My database is set to use the Cp1252 code page, and
so should this not recognise the characters input to it? eg A japanese
character in UTF-8 was broken down to ‚ and these three characters are in
the windows character set.  However by the time it reaches the database it
is changed to ?    Does this mean that somewhere along the way the string
is being changed into a different form where the character set doesn't
support certain characters?   Does the fact that Java internally uses
UTF-16(I think) cause any problems?

Thanks for any suggestions,
Stephen
(If you have just gotten this message already I apologise but I was having
difficulty with registration)
Received on Thursday, 7 September 2000 07:48:16 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:55 GMT