Taki, Thank you for your response. I still have some questions. About the Universal Character Set, Wikipedia says: "The Universal Character Set (UCS) is defined by the ISO/IEC 10646 International Standard as a character set on which many encodings are based." My question revolves around the "on which many encodings are based" portion of that sentence. As far as I can tell UCS does not contain any information on how each character is to be encoded. About the ISO 10646 Wikipedia says: "ISO 10646 defines several character encoding forms for the Universal Character Set." Some examples listed are USC-2, UCS-4, UTF-8 and UTF-16. In your email you referred to the Unicode Character Database, but not a specific encoding scheme. About Unicode Wikipedia says: "Unicode can be implemented by different character encodings. The most commonly used encodings are UTF-8 (which uses 1 byte for all ASCII characters, which have the same code values as in the standard ASCII encoding, and up to 4 bytes for other characters)". I have to laugh at myself. It's likely that Wikipedia may not be the best source for my "education" on character set / character encoding. I had my fingers crossed that your answer to my original question would be "Ah, yes. Use the UTF-8 encoding." That's what we use internally in our product. Oh how convenient that would be! I hope I am effectively communicating my question. Please help set me straight. Sincerely, MelanieReceived on Friday, 23 May 2008 19:37:40 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 1 October 2008 18:12:36 GMT