Unicode Conversions from Stephen Toner on 2000-09-07 (www-international@w3.org from July to September 2000)

From: Stephen Toner <Stephen.Toner@virtualaccess.com>
Date: Thu, 7 Sep 2000 12:50:27 +0100
To: <www-international@w3.org>
Message-ID: <JEEKILJANELCODIAEJFNOEMBCAAA.Stephen.Toner@virtualaccess.com>

Hello all,
I have been trying to input unicode from a browser and store it in a
database.  The problem is the different encodings used to represent the
unicode.
The input text is in the UTF-8 format.  I have read on the Microsoft support
site that SQL Server 7.0 uses a different Unicode encoding (UCS-2) and does
not recognize UTF-8 as valid character data.  Of the solutions offered only
two were of any use:
1) Convert between the two on input and output
2) Store as raw data in binary form
I have been unable to get the raw data into the database correctly so
decided to try the first option.  However although I keep reading that round
conversion between the 2 formats is quick, easy and reliable, i have been
unable to accomplish this.  I am using JSPs, so the Session.Codepage command
doesn't work, and anyway I would prefer a less platform specific solution.
Does anyone know of a way of converting a java string in UTF-8 to UTF-16
format.
Also I was wondering if anyone knows why the UTF-8 can't be treated as a
regular Latin1 string.  My database is set to use the Cp1252 code page, and
so should this not recognise the characters input to it? eg A japanese
character in UTF-8 was broken down to ã‚ and these three characters are in
the windows character set.  However by the time it reaches the database it
is changed to ã?    Does this mean that somewhere along the way the string
is being changed into a different form where the character set doesn't
support certain characters?   Does the fact that Java internally uses
UTF-16(I think) cause any problems?

Thanks for any suggestions,
Stephen
(If you have just gotten this message already I apologise but I was having
difficulty with registration)

Received on Thursday, 7 September 2000 07:48:16 UTC