RE: UTF-8 and UTF-16 from Savourel, Yves on 2000-09-08 (www-international@w3.org from July to September 2000)

From: Savourel, Yves <Yves.Savourel@corp.sykes.com>
Date: Fri, 8 Sep 2000 09:22:03 -0400
To: "'www-international@w3.org'" <www-international@w3.org>
Message-ID: <5534833179F2D2118A6100805FC7CF26BA7B19@flatiron>

Stephen,

You sure can store the UTF-8 string in a normal string or database field:
UTF-8 is just a multi-byte encoding with ASCII character looking like ASCII
and extended character encoded in 2 or more bytes above 127 (no null bytes
anywhere).

Just keep in mind that the UTF-8 string is not "represented" by the code set
used in the database, but just interpreted "as it" (something like looking
at a Shift_JIS string with Latin-1). Your sorting, folding, comparison, etc.
functions will use the database code set not UTF-8 and may have unexpected
results.

For the conversion: you can use the C routines provided by Taligent on the
Unicode Web site at: http://www.unicode.org/Public/PROGRAMS/CVTUTF/ (The
files CVTUTF.*).

-yves

-----Original Message-----
From: Stephen Toner [mailto:Stephen.Toner@virtualaccess.com]
Sent: Thursday, September 07, 2000 4:14 PM
To: www-international@w3.org
Subject: UTF-8 and UTF-16


Hi,
The input from my web page is in the UTF-8 form, but my database uses 
UTF-16/UCS-2.  I was wondering if there were any Java converters out there 
that could do this conversion, or if it would be simple enough to try and 
change an existing C one to perform the same task.  I was also wondering 
why UTF-8 could not be just stored as a character string in the 
Database.  If the characters are now 8-bit surely they could be represented 
by the ordinary character set in the database.
Any help would be appreciated because I feel like I'm banging my head off a 
brick wall at the minute.
Thanks,
Stephen

Received on Friday, 8 September 2000 09:21:35 UTC