- From: Addison Phillips [FCOM] <AddisonP@flashcom.net>
- Date: Tue, 06 Jun 2000 14:09:45 +0900
- To: www-international@w3.org
> At 07:54 AM 6/5/00 -0700, Michael \(michka\) Kaplan wrote: > >There has long been controversy over the fact that MS products use "Unicode" > >to mean UCS-2 > > In the new, more precise terminology you would say that "MS products use > 'Unicode' to mean UTF-16". Since plain text files are prefixed with a BOM, > the encoding is UTF-16, (internally tagged, endianess can be determined > from BOM) instead of UTF-16LE (little endian, externally tagged and no BOM > allowed). There is, incidentally, no shorthand to describe "UTF-16 with BOM > that I know (from other information) to be little endian". Actually, in Win2000 and later, MS products mean UTF-16LE. Older products really mean UCS-2 (as in, they don't understand surrogates and converting UTF-8 values beyond 0xFFFF will result in undefined behavior or data loss). Of course, support for UTF-8 was spotty or non-existant in those products anyway, so I guess it works out to be the same. thanks Addison Addison Phillips mailto:AddisonP@flashcom.net
Received on Tuesday, 6 June 2000 01:26:47 UTC