W3C home > Mailing lists > Public > www-international@w3.org > April to June 2000

[Moderator Action] Re: BOM & Unicode editors

From: Addison Phillips [FCOM] <AddisonP@flashcom.net>
Date: Tue, 06 Jun 2000 14:09:45 +0900
Message-Id: <>
To: www-international@w3.org
 > At 07:54 AM 6/5/00 -0700, Michael \(michka\) Kaplan wrote:
 > >There has long been controversy over the fact that MS products use
 > >to mean UCS-2
 > In the new, more precise terminology you would say that "MS products use
 > 'Unicode' to mean UTF-16". Since plain text files are prefixed with a BOM,
 > the encoding is UTF-16, (internally tagged, endianess can be determined
 > from BOM) instead of UTF-16LE (little endian, externally tagged and no BOM
 > allowed). There is, incidentally, no shorthand to describe "UTF-16 with
 > that I know (from other information) to be little endian".

Actually, in Win2000 and later, MS products mean UTF-16LE. Older products
really mean UCS-2 (as in, they don't understand surrogates and converting
UTF-8 values beyond 0xFFFF will result in undefined behavior or data loss).
Of course, support for UTF-8 was spotty or non-existant in those products
anyway, so I guess it works out to be the same.



Addison Phillips
Received on Tuesday, 6 June 2000 01:26:47 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:19 UTC