- From: Asmus Freytag <asmusf@ix.netcom.com>
- Date: Mon, 05 Jun 2000 12:54:36 -0700
- To: "Michael \(michka\) Kaplan" <michka@trigeminal.com>, "Martin J. Duerst" <duerst@w3.org>, "Saba Sundaramurthy" <ssundaramurthy@verisign.com>, "Chris Lilley" <chris@w3.org>
- Cc: <mozilla-i18n@mozilla.org>, <www-international@w3.org>, <i18n-prog@acoin.com>
At 07:54 AM 6/5/00 -0700, Michael \(michka\) Kaplan wrote: >There has long been controversy over the fact that MS products use "Unicode" >to mean UCS-2 In the new, more precise terminology you would say that "MS products use 'Unicode' to mean UTF-16". Since plain text files are prefixed with a BOM, the encoding is UTF-16, (internally tagged, endianess can be determined from BOM) instead of UTF-16LE (little endian, externally tagged and no BOM allowed). There is, incidentally, no shorthand to describe "UTF-16 with BOM that I know (from other information) to be little endian". >and consider UTF-8 to be a multibyte encoding. There is nothing wrong with this. UTF-8 is a very proper multibyte encoding. It's smallest interpretable element is a byte, and like all multibyte encodings, each character is encoded by a byte sequence which may have one of several lenghts, in this case 1, 2, 3 or 4 bytes. The two distinguishing faccts about UTF-8 is that it is self-synchronizing, which is a nice feature for a multibyte encoding, and that it can express all Unicode characters (identical subset). A./
Received on Monday, 5 June 2000 15:46:22 UTC