- From: Stephen Toner <Stephen.Toner@virtualaccess.com>
- Date: Thu, 7 Sep 2000 12:50:27 +0100
- To: <www-international@w3.org>
- Message-ID: <JEEKILJANELCODIAEJFNOEMBCAAA.Stephen.Toner@virtualaccess.com>
Hello all, I have been trying to input unicode from a browser and store it in a database. The problem is the different encodings used to represent the unicode. The input text is in the UTF-8 format. I have read on the Microsoft support site that SQL Server 7.0 uses a different Unicode encoding (UCS-2) and does not recognize UTF-8 as valid character data. Of the solutions offered only two were of any use: 1) Convert between the two on input and output 2) Store as raw data in binary form I have been unable to get the raw data into the database correctly so decided to try the first option. However although I keep reading that round conversion between the 2 formats is quick, easy and reliable, i have been unable to accomplish this. I am using JSPs, so the Session.Codepage command doesn't work, and anyway I would prefer a less platform specific solution. Does anyone know of a way of converting a java string in UTF-8 to UTF-16 format. Also I was wondering if anyone knows why the UTF-8 can't be treated as a regular Latin1 string. My database is set to use the Cp1252 code page, and so should this not recognise the characters input to it? eg A japanese character in UTF-8 was broken down to ã‚ and these three characters are in the windows character set. However by the time it reaches the database it is changed to ã? Does this mean that somewhere along the way the string is being changed into a different form where the character set doesn't support certain characters? Does the fact that Java internally uses UTF-16(I think) cause any problems? Thanks for any suggestions, Stephen (If you have just gotten this message already I apologise but I was having difficulty with registration)
Received on Thursday, 7 September 2000 07:48:16 UTC