- From: Savourel, Yves <Yves.Savourel@corp.sykes.com>
- Date: Fri, 8 Sep 2000 09:22:03 -0400
- To: "'www-international@w3.org'" <www-international@w3.org>
Stephen, You sure can store the UTF-8 string in a normal string or database field: UTF-8 is just a multi-byte encoding with ASCII character looking like ASCII and extended character encoded in 2 or more bytes above 127 (no null bytes anywhere). Just keep in mind that the UTF-8 string is not "represented" by the code set used in the database, but just interpreted "as it" (something like looking at a Shift_JIS string with Latin-1). Your sorting, folding, comparison, etc. functions will use the database code set not UTF-8 and may have unexpected results. For the conversion: you can use the C routines provided by Taligent on the Unicode Web site at: http://www.unicode.org/Public/PROGRAMS/CVTUTF/ (The files CVTUTF.*). -yves -----Original Message----- From: Stephen Toner [mailto:Stephen.Toner@virtualaccess.com] Sent: Thursday, September 07, 2000 4:14 PM To: www-international@w3.org Subject: UTF-8 and UTF-16 Hi, The input from my web page is in the UTF-8 form, but my database uses UTF-16/UCS-2. I was wondering if there were any Java converters out there that could do this conversion, or if it would be simple enough to try and change an existing C one to perform the same task. I was also wondering why UTF-8 could not be just stored as a character string in the Database. If the characters are now 8-bit surely they could be represented by the ordinary character set in the database. Any help would be appreciated because I feel like I'm banging my head off a brick wall at the minute. Thanks, Stephen
Received on Friday, 8 September 2000 09:21:35 UTC