[Fwd: [Fwd: Unicode Conversions]] from Mark Davis on 2000-09-08 (www-international@w3.org from July to September 2000)

From: Mark Davis <markdavis@ispchannel.com>
Date: Fri, 08 Sep 2000 07:05:57 -0700
To: www-international <www-international@w3.org>
Message-ID: <39B8F245.C1B1E433@ispchannel.com>

Looks like this didn't get through the first time.


> Mark Davis wrote:
>
> > >
> >
> > > Hello all,
> > > I have been trying to input unicode from a browser and store it in a database.  The problem is the different encodings used to represent the unicode.
> > > The input text is in the UTF-8 format.  I have read on the Microsoft support site that SQL Server 7.0 uses a different Unicode encoding (UCS-2) and does not recognize UTF-8
> > > as valid character data.  Of the solutions offered only two were of any use:
> > > 1) Convert between the two on input and output
> > > 2) Store as raw data in binary form
> > > I have been unable to get the raw data into the database correctly so decided to try the first option.  However although I keep reading that round conversion between the 2
> > > formats is quick, easy and reliable, i have been unable to accomplish this.  I am using JSPs, so the Session.Codepage command doesn't work, and anyway I would prefer a
> > > less platform specific solution.  Does anyone know of a way of converting a java string in UTF-8 to UTF-16 format.
> > >
> > I talk about it a bit in an older paper of mine, at
> > http://www.ibm.com/java/education/globalapps/Converting.html
> >
> > You can either use the String API or Stream API. For Strings use:
> >
> > String utf16chars = new String(utf8bytes[],"UTF8");
> >
> > utf16bytes = utf16chars.getBytes("UTF8");
> >
> > For Streams, use InputStreamReader
> > (http://java.sun.com/j2se/1.3/docs/api/java/io/InputStreamReader.html)
> > or OutputStreamWriter.
> >
> > > Also I was wondering if anyone knows why the UTF-8 can't be treated as a regular Latin1 string.  My database is set to use the Cp1252 code page, and so should this not
> > >
> > Whenever you mark bytes with the wrong codepage, you are likely to get
> > errors; any software that interprets or converts those bytes will get
> > the wrong answer. Using Cp1252 when what you are storing is either
> > UTF-8 or UTF-16 will give you problems.
> >
> > > recognise the characters input to it? eg A japanese character in UTF-8 was broken down to ??? and these three characters are in the windows character set.  However by
> > > the time it reaches the database it is changed to ?    Does this mean that somewhere along the way the string is being changed into a different form where the character set
> > > doesn't support certain characters?   Does the fact that Java internally uses UTF-16(I think) cause any problems?
> > >
> > Java supports UCS-2, but UTF-16 is simply an extension of UCS-2, and
> > shares the same storage. The difference is not relevant to you here.
> >
> > >
> > > Thanks for any suggestions,
> > > Stephen
> > > (If you have just gotten this message already I apologise but I was having difficulty with registration)
> > >

Received on Friday, 8 September 2000 10:03:58 UTC