Re: BOM & Unicode editors

In Windows 2000 notepad, the option to save your files as any of the
following dour formats exists:

ANSI (actually it means MBCS using the system default code page)
Unicode Little Endian (actually it means UCS-2)
Unicode Big Endian (also means UCS-2, I believe? At least for RISC
processors, etc.)
UTF-8

The latter three do indeed contain byte order marks, if for no other reason
than reopening the file allows notepad to read it properly and not guess
about the encoding.

FrontPage 2000 does not support the middle two, but it supports any
supported MBCS code page on the system and UTF-8... with no byte mark
required. But they mark encoding with other means.

But for a program like notepad, the ability to open the file, save it, and
re-open it pretty much requires the byte mark.

michka


----- Original Message -----
From: "Chris Lilley" <chris@w3.org>
To: "Asmus Freytag" <asmusf@ix.netcom.com>
Cc: "Saba Sundaramurthy" <ssundaramurthy@verisign.com>;
<mozilla-i18n@mozilla.org>; <www-international@w3.org>;
<i18n-prog@acoin.com>
Sent: Wednesday, May 10, 2000 1:43 AM
Subject: Re: BOM & Unicode editors


 >
 >
 > Asmus Freytag wrote:
 > >
 > > At 04:55 PM 5/9/00 -0700, Saba Sundaramurthy wrote:
 > > >     Is this something all editors that save files in Unicode or UTF-8
are
 > > >required to do? Can I depend on the presence of this marker in my code?
 > >
 > > No, it's not a requirement, but it's a convention followed by quite a
few
 > > tools,
 > > because otherwise it's harder to use the same .txt extension for both
ASCII and
 > > Unicode (and also it helps to mark the byte order, of course).
 >
 > This is all fine and well for UTF-16, but what about UTF-8 ? why does the
 > byte order matter?
 >
 > > I would recommend that you look for it in your code, if you plan to read
UTF-16
 > > files.
 >
 > And for UTF-8 files?
 >
 > --
 > Chris
 > /* the i18n-prog homepage is at:               */
 > /* http://www.acoin.com/i18n/i18n-prog.htm     */
 > /* See the page for removal instructions, etc. */
 >

Received on Thursday, 11 May 2000 02:00:55 UTC