W3C home > Mailing lists > Public > www-international@w3.org > April to June 2000

[Moderator Action] Re: BOM & Unicode editors

From: Addison Phillips [FCOM] <AddisonP@flashcom.net>
Date: Tue, 06 Jun 2000 14:09:45 +0900
Message-Id: <4.2.0.58.J.20000606140935.009e6530@sh.w3.mag.keio.ac.jp>
To: www-international@w3.org
 > At 07:54 AM 6/5/00 -0700, Michael \(michka\) Kaplan wrote:
 > >There has long been controversy over the fact that MS products use
"Unicode"
 > >to mean UCS-2
 >
 > In the new, more precise terminology you would say that "MS products use
 > 'Unicode' to mean UTF-16". Since plain text files are prefixed with a BOM,
 > the encoding is UTF-16, (internally tagged, endianess can be determined
 > from BOM) instead of UTF-16LE (little endian, externally tagged and no BOM
 > allowed). There is, incidentally, no shorthand to describe "UTF-16 with
BOM
 > that I know (from other information) to be little endian".

Actually, in Win2000 and later, MS products mean UTF-16LE. Older products
really mean UCS-2 (as in, they don't understand surrogates and converting
UTF-8 values beyond 0xFFFF will result in undefined behavior or data loss).
Of course, support for UTF-8 was spotty or non-existant in those products
anyway, so I guess it works out to be the same.

thanks

Addison

Addison Phillips
mailto:AddisonP@flashcom.net
Received on Tuesday, 6 June 2000 01:26:47 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:55 GMT