RE: character encoding from Melanie Stallings on 2008-05-23 (public-exi@w3.org from May 2008)

From: Melanie Stallings <ms.protrain@yahoo.com>
Date: Fri, 23 May 2008 12:36:59 -0700 (PDT)
To: public-exi@w3.org
Message-ID: <735056.61087.qm@web63013.mail.re1.yahoo.com>

Taki,

Thank you for your response. I still have some questions.

About the Universal Character Set, Wikipedia says: "The Universal Character Set (UCS) is defined by the ISO/IEC 10646 International Standard as a character set on which many encodings are based."

My question revolves around the "on which many encodings are based" portion of that sentence. As far as I can tell UCS does not contain any information on how each character is to be encoded.

About the ISO 10646 Wikipedia says: "ISO 10646 defines several character encoding forms for the Universal Character Set." Some examples listed are USC-2, UCS-4, UTF-8 and UTF-16.

In your email you referred to the Unicode Character Database, but not a specific encoding scheme. About Unicode Wikipedia says: "Unicode can be implemented by different character encodings. The most commonly used encodings are UTF-8 (which uses 1 byte for all ASCII characters, which have the same code values as in the standard ASCII encoding, and up to 4 bytes for other characters)".

I have to laugh at myself. It's likely that Wikipedia may not be the best source for my "education" on character set / character encoding.

I had my fingers crossed that your answer to my original question would be "Ah, yes. Use the UTF-8 encoding." That's what we use internally in our product. Oh how convenient that would be!

I hope I am effectively communicating my question. Please help set me straight.

Sincerely,

Melanie

Received on Friday, 23 May 2008 19:37:40 UTC