W3C home > Mailing lists > Public > public-exi@w3.org > May 2008

RE: character encoding

From: Melanie Stallings <ms.protrain@yahoo.com>
Date: Fri, 23 May 2008 12:36:59 -0700 (PDT)
To: public-exi@w3.org
Message-ID: <735056.61087.qm@web63013.mail.re1.yahoo.com>
  Thank you for your response.  I still have some questions.
  About the Universal Character Set, Wikipedia says: "The Universal Character Set (UCS) is defined by the ISO/IEC 10646 International Standard as a character set on which many encodings are based."
  My question revolves around the "on which many encodings are based" portion of that sentence.  As far as I can tell UCS does not contain any information on how each character is to be encoded.
  About the ISO 10646 Wikipedia says: "ISO 10646 defines several character encoding forms for the Universal Character Set."  Some examples listed are USC-2, UCS-4, UTF-8 and UTF-16.
  In your email you referred to the Unicode Character Database, but not a specific encoding scheme.  About Unicode Wikipedia says: "Unicode can be implemented by different character encodings.  The most commonly used encodings are UTF-8 (which uses 1 byte for all ASCII characters, which have the same code values as in the standard ASCII encoding, and up to 4 bytes for other characters)".
  I have to laugh at myself.  It's likely that Wikipedia may not be the best source for my "education" on character set / character encoding.
  I had my fingers crossed that your answer to my original question would be "Ah, yes.  Use the UTF-8 encoding."  That's what we use internally in our product.  Oh how convenient that would be!
  I hope I am effectively communicating my question.  Please help set me straight.

Received on Friday, 23 May 2008 19:37:40 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 19:52:42 UTC