W3C home > Mailing lists > Public > www-international@w3.org > October to December 2012

Re: For review: The byte-order mark (BOM) in HTML

From: Richard Ishida <ishida@w3.org>
Date: Thu, 20 Dec 2012 11:08:52 +0000
Message-ID: <50D2F1C4.3020905@w3.org>
To: www-international@w3.org
Thanks Albert. Addressed those in the newly updated version.

Richard Ishida
Internationalization Activity Lead
W3C (World Wide Web Consortium)


On 18/12/2012 18:39, Albert Lunde wrote:
> "To communicate which byte order was in use, U+FEFF (the byte-order
> mark) was used at the start of the stream as magic number that is not
> logically part of the text the stream represents."
> I'd say .."as a magic number"..
>   "You should also be aware that, although ASCII is a subset of UTF-8, a
> file that starts with a BOM is no longer ASCII-compatible."
> As I think was remarked on the list, the intended meaning of the phrase
> "ASCII-compatible" is not too obvious.
> I _think_ this refers to the (often desirable) property of UTF-8 that
> characters from the US-ASCII range are encoded in UTF-8 in a way that is
> byte-for-byte identical to US-ASCII encoding. I think it would be better
> to say that directly, somehow.
> For example:
> "UTF-8 without a BOM has the property that characters from the US-ASCII
> range are encoded byte-for-byte the same way as by the US-ASCII
> encoding. Adding a BOM inserts additional bytes, so this is no longer
> true."
Received on Thursday, 20 December 2012 11:09:20 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:34 UTC