- From: David Woolley <david@djwhome.demon.co.uk>
- Date: Tue, 20 Aug 2002 21:48:41 +0100 (BST)
- To: w3c-wai-ig@w3.org
> Could somebody please explain the difference between UTF8 and UTF16 to me > and why you would want to use UTF16 over UTF8? UTF16 uses two bytes per Unicode character (excluding the extension areas, which use 4 bytes, but these shouldn't appear often). UTF8 uses a variable number of bytes, such that American can be represented in one byte, British requires two bytes, occasionally, Western European languages require two bytes a lot of the time, and the rest of the world needs three or four most of the time. It codes for the same set of characters as UTF16. UTF16 is much easier to handle for software writers and is more efficient for world languages. Generally, world language aware software will use UTF16 internally. UTF8 contains all the characters needed for the language structure of HTML in 8 bit characters, which are the same as those in ASCII. For HTML, you can only legally use UTF16 if you include the charset parameter in the real HTTP headers, as meta elements can't be detected unless the character set is ASCII compatible. I'm not sure about XML; it might recognize the Unicode byte order marks, used to signal UTF16. Some browsers may sniff out UTF16, even when the HTTP headers don't identify it. > _________________________________________________________ > This email is confidential and intended solely for the use of the Bogus confidentiality notice deleted.
Received on Tuesday, 20 August 2002 17:04:01 UTC