- From: Asmus Freytag <asmusf@ix.netcom.com>
- Date: Mon, 05 Jun 2000 23:49:58 -0700
- To: "Addison Phillips [FCOM]" <AddisonP@flashcom.net> (by way of "Martin J. Duerst" <duerst@w3.org>), www-international@w3.org
At 02:09 PM 6/6/00 +0900, Addison Phillips [FCOM] wrote: >Actually, in Win2000 and later, MS products mean UTF-16LE. No. The designation UTF-16LE is reserved for the case that you label the data stream externally with the byte order. MS products (at least for plain text files) tag the data with a BOM character, making the data UTF-16 (albeit in the 'little-endian' flavor). As I wrote, there is no shortcut designation for this. >Older products >really mean UCS-2 (as in, they don't understand surrogates and converting >UTF-8 values beyond 0xFFFF will result in undefined behavior or data loss). >Of course, support for UTF-8 was spotty or non-existant in those products >anyway, so I guess it works out to be the same. Actually, since most of these older products don't interpret surrogate values, you can expect a fair amount of blind pass-thru - although I'm sure that you can easily find instances of bugs that can cause (or allow the user the chance of) splitting or truncating surrogate pairs. In the long run, it matters more how soon programs provide the full support whether via UTf-8 or UTF-16. A./
Received on Tuesday, 6 June 2000 02:41:39 UTC