W3C home > Mailing lists > Public > www-international@w3.org > April to June 2000

Re: BOM & Unicode editors

From: Martin J. Duerst <duerst@w3.org>
Date: Fri, 12 May 2000 18:35:54 +0900
Message-Id: <>
To: Saba Sundaramurthy <ssundaramurthy@verisign.com>, mozilla-i18n@mozilla.org, www-international@w3.org, i18n-prog@acoin.com
Hello Saba,

For some more information on UTF-8, please see

There are some errors in the slide on page 5, but
they are not very relevant here.

The paper in particular shows how easy it is to automatically
detect UTF-8 based on its specific byte patterns. This can
mostly be done on the fly, i.e. a decoder starts with the
assumption that it reads only ASCII and decides whether it's
the local legacy encoding or UTF-8 once the first bytes
with the 8th bit set are seen.

One big problem of using the BOM as a 'magic number' for UTF-8
also shouldn't go unmentionned here:

UTF-8 without a BOM has the very important property that it
encodes ASCII as ASCII, and everything else as something else.
An ASCII file therefore is automatically UTF-8. All the nice
things that you can do with text files can be done with UTF-8,
too. However, once there is a BOM on a file, an ASCII file is
no longer ASCII, and very simple operations such as an Unix
'cat' fail.

Regards,   Martin.

At 00/05/09 16:55 -0700, Saba Sundaramurthy wrote:
>1)    Playing with text editors (FrontPage 2000 and Notepad) in Windows NT
>and Windows 2000, I noticed that when ever the contents are saved unicode or
>UTF-8 there is a marker FEFF placed at the beginning of the file. Inspecting
>this marker can give information about the byte ordering of the machine and
>also if the following bytes are Unicode or UTF-8.
>     Is this something all editors that save files in Unicode or UTF-8 are
>required to do? Can I depend on the presence of this marker in my code?
>2)      Are there any editors available on unix to allow you to save text in
>Unicode or UTF-8?
>Thanks in advance,
Received on Friday, 12 May 2000 05:31:29 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 22:04:17 UTC