At 07:54 AM 6/5/00 -0700, Michael \(michka\) Kaplan wrote: >There has long been controversy over the fact that MS products use "Unicode" >to mean UCS-2 In the new, more precise terminology you would say that "MS products use 'Unicode' to mean UTF-16". Since plain text files are prefixed with a BOM, the encoding is UTF-16, (internally tagged, endianess can be determined from BOM) instead of UTF-16LE (little endian, externally tagged and no BOM allowed). There is, incidentally, no shorthand to describe "UTF-16 with BOM that I know (from other information) to be little endian". >and consider UTF-8 to be a multibyte encoding. There is nothing wrong with this. UTF-8 is a very proper multibyte encoding. It's smallest interpretable element is a byte, and like all multibyte encodings, each character is encoded by a byte sequence which may have one of several lenghts, in this case 1, 2, 3 or 4 bytes. The two distinguishing faccts about UTF-8 is that it is self-synchronizing, which is a nice feature for a multibyte encoding, and that it can express all Unicode characters (identical subset). A./Received on Monday, 5 June 2000 15:46:22 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:55 GMT