- From: Melanie Stallings <ms.protrain@yahoo.com>
- Date: Fri, 23 May 2008 12:36:59 -0700 (PDT)
- To: public-exi@w3.org
- Message-ID: <735056.61087.qm@web63013.mail.re1.yahoo.com>
Taki, Thank you for your response. I still have some questions. About the Universal Character Set, Wikipedia says: "The Universal Character Set (UCS) is defined by the ISO/IEC 10646 International Standard as a character set on which many encodings are based." My question revolves around the "on which many encodings are based" portion of that sentence. As far as I can tell UCS does not contain any information on how each character is to be encoded. About the ISO 10646 Wikipedia says: "ISO 10646 defines several character encoding forms for the Universal Character Set." Some examples listed are USC-2, UCS-4, UTF-8 and UTF-16. In your email you referred to the Unicode Character Database, but not a specific encoding scheme. About Unicode Wikipedia says: "Unicode can be implemented by different character encodings. The most commonly used encodings are UTF-8 (which uses 1 byte for all ASCII characters, which have the same code values as in the standard ASCII encoding, and up to 4 bytes for other characters)". I have to laugh at myself. It's likely that Wikipedia may not be the best source for my "education" on character set / character encoding. I had my fingers crossed that your answer to my original question would be "Ah, yes. Use the UTF-8 encoding." That's what we use internally in our product. Oh how convenient that would be! I hope I am effectively communicating my question. Please help set me straight. Sincerely, Melanie
Received on Friday, 23 May 2008 19:37:40 UTC