- From: Michael \(michka\) Kaplan <michka@trigeminal.com>
- Date: Mon, 11 Sep 2000 16:16:41 +0900
- To: www-international@w3.org
MS FrontPage 2000 support of UTF-8 is what keeps my site up and running! How else to support Tamil/Hindi/Georgian/Armenian? :-) Don't forget Microsoft Jet, whose text IISAM will support UTF-7, UTF-8, UTF-16LE, and UTF-16BE. :-) michka (who is a huge fan of Jet and its text IISAM, which is very useful in his work!) ----- Original Message ----- From: "Chris Pratley" <chrispr@MICROSOFT.com> To: "'Michael (michka) Kaplan'" <michka@trigeminal.com>; <www-international@w3.org> Sent: Tuesday, June 06, 2000 6:50 PM Subject: RE: [Moderator Action] Re: BOM & Unicode editors > Michael is a little overzealous in dismissing our support of various forms > of Unicode: > 1. Both Notepad on Win2000 and Word2000 on any system support input/output > as Big-Endian UTF-16 plain text (with BOM). > 2. FrontPage (2000 and perhaps 98) allow open/save of HTML files as UTF-8. > > Chris Pratley > Group Program Manager > Microsoft Word > > -----Original Message----- > From: Michael (michka) Kaplan [mailto:michka@trigeminal.com] > Sent: June 5, 2000 11:02 PM > To: www-international@w3.org > Subject: [Moderator Action] Re: BOM & Unicode editors > > I think that the move is a very good thing. We need standards like this. :-) > > > > Unfortunately, since Microsoft does not currently support any endian system > > other than Little Endian, you probably need to know the Microsoft one if you > > want to work with Windows 2000.... > > > > >ANSI (actually it means MBCS using the system default code page) > > >Unicode (Little Endian, actually it means UCS-2) > > >Unicode Big Endian (also means UCS-2, I believe? At least for RISC > > processors, etc.) > > >UTF-8 > > > > There has long been controversy over the fact that MS products use "Unicode" > > to mean UCS-2 and consider UTF-8 to be a multibyte encoding. I think it > > mostly stems from the fact that the Unicode APIs of NT have always supported > > UCS-2. The discrpency is compounded by several problems: > > > > 1) Most other (non-MS) products and OSes usually mean to use UTF-8 when > > referring to Unicode. > > 2) Many Microsoft products like IIS and FrontPage do not handle Unicode > > files at all > > 3) Products like IIS only handle UTF-8 only in the 5.0 version > > > > > > michka > > > > > > ----- Original Message ----- > > From: "Asmus Freytag" <asmusf@ix.netcom.com> > > To: "Michael (michka) Kaplan" <michka@trigeminal.com>; "Martin J. Duerst" > > <duerst@w3.org> > > Sent: Thursday, May 11, 2000 4:13 PM > > Subject: Re: BOM & Unicode editors > > > > > > > We are now moving (at least within Unicode) to a consistent terminology > > > > > > UTF-8 > > > UTF-16 (Endianess dependent, usually uses BOM) > > > UTF-16BE (known to be big endian, no BOM) > > > UTF-16LE (known to be little endian, no BOM) > > > UTF-32 (restricted to codes 0000-10FFFF) > > > > > > For the generic UTF-16 there is one logical designation, two physical > > > manifestations of opposite byte order. Unfortunately there is no term > > > for the actual physical representation, since the two other terms not > > > only designate a specific byte order, but also imply the absense of a > > > BOM character - furthermore, when you are actually processing > > > the data, the endianness of interest is not so much whether it's little > > > endian or big endian, but rather whether its same endian or opposite > > > endian. > > > > > > At 02:24 PM 5/11/00 +0900, you wrote: > > > >In Windows 2000 notepad, the option to save your files as any of the > > > >following dour formats exists: > > > > > > > >ANSI (actually it means MBCS using the system default code page) > > > >Unicode Little Endian (actually it means UCS-2) > > > >Unicode Big Endian (also means UCS-2, I believe? At least for RISC > > > >processors, etc.) > > > >UTF-8 > > > > > > > >The latter three do indeed contain byte order marks, if for no other > > reason > > > >than reopening the file allows notepad to read it properly and not guess > > > >about the encoding. > > > > > > > >FrontPage 2000 does not support the middle two, but it supports any > > > >supported MBCS code page on the system and UTF-8... with no byte mark > > > >required. But they mark encoding with other means. > > > > > > > >But for a program like notepad, the ability to open the file, save it, > > and > > > >re-open it pretty much requires the byte mark. > > > > > > > >michka > > > > > > > > > > > >----- Original Message ----- > > > >From: "Chris Lilley" <chris@w3.org> > > > >To: "Asmus Freytag" <asmusf@ix.netcom.com> > > > >Cc: "Saba Sundaramurthy" <ssundaramurthy@verisign.com>; > > > ><mozilla-i18n@mozilla.org>; <www-international@w3.org>; > > > ><i18n-prog@acoin.com> > > > >Sent: Wednesday, May 10, 2000 1:43 AM > > > >Subject: Re: BOM & Unicode editors > > > > > > > > > > > > > > > > > > > > > > > Asmus Freytag wrote: > > > > > > > > > > > > At 04:55 PM 5/9/00 -0700, Saba Sundaramurthy wrote: > > > > > > > Is this something all editors that save files in Unicode or > > UTF-8 > > > >are > > > > > > >required to do? Can I depend on the presence of this marker in my > > code? > > > > > > > > > > > > No, it's not a requirement, but it's a convention followed by quite > > a > > > >few > > > > > > tools, > > > > > > because otherwise it's harder to use the same .txt extension for > > both > > > >ASCII and > > > > > > Unicode (and also it helps to mark the byte order, of course). > > > > > > > > > > This is all fine and well for UTF-16, but what about UTF-8 ? why does > > the > > > > > byte order matter? > > > > > > > > > > > I would recommend that you look for it in your code, if you plan to > > read > > > >UTF-16 > > > > > > files. > > > > > > > > > > And for UTF-8 files? > > > > > > > > > > -- > > > > > Chris > > > > > /* the i18n-prog homepage is at: */ > > > > > /* http://www.acoin.com/i18n/i18n-prog.htm */ > > > > > /* See the page for removal instructions, etc. */ > > > > > > > > > >
Received on Monday, 11 September 2000 03:55:29 UTC