W3C home > Mailing lists > Public > www-international@w3.org > April to June 2000

Re: BOM & Unicode editors

From: Yung-Fong Tang <ftang@netscape.com>
Date: Sat, 13 May 2000 13:36:39 -0700
Message-ID: <391DBCD6.322F0AD1@netscape.com>
To: Saba Sundaramurthy <ssundaramurthy@verisign.com>
CC: "'Robert A. Rosenberg'" <rarpsl@flashcom.net>, mozilla-i18n@mozilla.org, www-international@w3.org, i18n-prog@acoin.com


Saba Sundaramurthy wrote:

>         UTF-8 characters may expand to any number of bytes (up to 6 for
> UCS-4), I don't think byte order is important since the sequence will be
> written out one byte at a time in the correct order.
>
>     As confirmed by Michka, the BOM is placed in UTF-8 files only as a
> 'magic cookie'.

That mean 0xEF 0xBB 0xBF as the first 3 bytes in a text file mean a UTF-8
file on Win2K, right ?

>
>
> Saba
>
> > -----Original Message-----
> > From: Robert A. Rosenberg [mailto:rarpsl@flashcom.net]
> > At 10:43 AM 05/10/2000 +0200, Chris Lilley wrote:
> > >This is all fine and well for UTF-16, but what about UTF-8 ?
> > why does the
> > >byte order matter?
> >
> > The byte-order is still important since it controls what
> > UTF-8 codes get
> > emitted for the same input codepoint. Just as you need to
> > know which order
> > to save the two bytes of a UTF-16 character, you need to know
> > what order to
> > assemble the two bytes that get created by expanding a UTF-8 sequence.
> >
Received on Saturday, 13 May 2000 16:36:48 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:55 GMT