- From: Michael \(michka\) Kaplan <michka@trigeminal.com>
- Date: Tue, 06 Jun 2000 15:01:59 +0900
- To: www-international@w3.org
I think that the move is a very good thing. We need standards like this. :-) Unfortunately, since Microsoft does not currently support any endian system other than Little Endian, you probably need to know the Microsoft one if you want to work with Windows 2000.... >ANSI (actually it means MBCS using the system default code page) >Unicode (Little Endian, actually it means UCS-2) >Unicode Big Endian (also means UCS-2, I believe? At least for RISC processors, etc.) >UTF-8 There has long been controversy over the fact that MS products use "Unicode" to mean UCS-2 and consider UTF-8 to be a multibyte encoding. I think it mostly stems from the fact that the Unicode APIs of NT have always supported UCS-2. The discrpency is compounded by several problems: 1) Most other (non-MS) products and OSes usually mean to use UTF-8 when referring to Unicode. 2) Many Microsoft products like IIS and FrontPage do not handle Unicode files at all 3) Products like IIS only handle UTF-8 only in the 5.0 version michka ----- Original Message ----- From: "Asmus Freytag" <asmusf@ix.netcom.com> To: "Michael (michka) Kaplan" <michka@trigeminal.com>; "Martin J. Duerst" <duerst@w3.org> Sent: Thursday, May 11, 2000 4:13 PM Subject: Re: BOM & Unicode editors > We are now moving (at least within Unicode) to a consistent terminology > > UTF-8 > UTF-16 (Endianess dependent, usually uses BOM) > UTF-16BE (known to be big endian, no BOM) > UTF-16LE (known to be little endian, no BOM) > UTF-32 (restricted to codes 0000-10FFFF) > > For the generic UTF-16 there is one logical designation, two physical > manifestations of opposite byte order. Unfortunately there is no term > for the actual physical representation, since the two other terms not > only designate a specific byte order, but also imply the absense of a > BOM character - furthermore, when you are actually processing > the data, the endianness of interest is not so much whether it's little > endian or big endian, but rather whether its same endian or opposite > endian. > > At 02:24 PM 5/11/00 +0900, you wrote: > >In Windows 2000 notepad, the option to save your files as any of the > >following dour formats exists: > > > >ANSI (actually it means MBCS using the system default code page) > >Unicode Little Endian (actually it means UCS-2) > >Unicode Big Endian (also means UCS-2, I believe? At least for RISC > >processors, etc.) > >UTF-8 > > > >The latter three do indeed contain byte order marks, if for no other reason > >than reopening the file allows notepad to read it properly and not guess > >about the encoding. > > > >FrontPage 2000 does not support the middle two, but it supports any > >supported MBCS code page on the system and UTF-8... with no byte mark > >required. But they mark encoding with other means. > > > >But for a program like notepad, the ability to open the file, save it, and > >re-open it pretty much requires the byte mark. > > > >michka > > > > > >----- Original Message ----- > >From: "Chris Lilley" <chris@w3.org> > >To: "Asmus Freytag" <asmusf@ix.netcom.com> > >Cc: "Saba Sundaramurthy" <ssundaramurthy@verisign.com>; > ><mozilla-i18n@mozilla.org>; <www-international@w3.org>; > ><i18n-prog@acoin.com> > >Sent: Wednesday, May 10, 2000 1:43 AM > >Subject: Re: BOM & Unicode editors > > > > > > > > > > > > > Asmus Freytag wrote: > > > > > > > > At 04:55 PM 5/9/00 -0700, Saba Sundaramurthy wrote: > > > > > Is this something all editors that save files in Unicode or UTF-8 > >are > > > > >required to do? Can I depend on the presence of this marker in my code? > > > > > > > > No, it's not a requirement, but it's a convention followed by quite a > >few > > > > tools, > > > > because otherwise it's harder to use the same .txt extension for both > >ASCII and > > > > Unicode (and also it helps to mark the byte order, of course). > > > > > > This is all fine and well for UTF-16, but what about UTF-8 ? why does the > > > byte order matter? > > > > > > > I would recommend that you look for it in your code, if you plan to read > >UTF-16 > > > > files. > > > > > > And for UTF-8 files? > > > > > > -- > > > Chris > > > /* the i18n-prog homepage is at: */ > > > /* http://www.acoin.com/i18n/i18n-prog.htm */ > > > /* See the page for removal instructions, etc. */ > > > >
Received on Tuesday, 6 June 2000 01:55:00 UTC