W3C home > Mailing lists > Public > www-international@w3.org > October to December 2012

Re: For review: The byte-order mark (BOM) in HTML

From: Albert Lunde <atlunde@panix.com>
Date: Tue, 18 Dec 2012 12:39:43 -0600
Message-ID: <50D0B86F.7090107@panix.com>
To: www International <www-international@w3.org>

"To communicate which byte order was in use, U+FEFF (the byte-order 
mark) was used at the start of the stream as magic number that is not 
logically part of the text the stream represents."

I'd say .."as a magic number"..

  "You should also be aware that, although ASCII is a subset of UTF-8, a 
file that starts with a BOM is no longer ASCII-compatible."

As I think was remarked on the list, the intended meaning of the phrase 
"ASCII-compatible" is not too obvious.

I _think_ this refers to the (often desirable) property of UTF-8 that 
characters from the US-ASCII range are encoded in UTF-8 in a way that is 
byte-for-byte identical to US-ASCII encoding. I think it would be better 
to say that directly, somehow.

For example:

"UTF-8 without a BOM has the property that characters from the US-ASCII 
range are encoded byte-for-byte the same way as by the US-ASCII 
encoding. Adding a BOM inserts additional bytes, so this is no longer true."
Received on Tuesday, 18 December 2012 18:40:02 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:34 UTC