Re: For review: The byte-order mark (BOM) in HTML from Richard Ishida on 2012-12-20 (www-international@w3.org from October to December 2012)

From: Richard Ishida <ishida@w3.org>
Date: Thu, 20 Dec 2012 11:08:52 +0000
To: www-international@w3.org
Message-ID: <50D2F1C4.3020905@w3.org>

Thanks Albert. Addressed those in the newly updated version.
RI


Richard Ishida
Internationalization Activity Lead
W3C (World Wide Web Consortium)

http://www.w3.org/International/
http://rishida.net/

On 18/12/2012 18:39, Albert Lunde wrote:
>
> "To communicate which byte order was in use, U+FEFF (the byte-order
> mark) was used at the start of the stream as magic number that is not
> logically part of the text the stream represents."
>
> I'd say .."as a magic number"..
>
>
>   "You should also be aware that, although ASCII is a subset of UTF-8, a
> file that starts with a BOM is no longer ASCII-compatible."
>
> As I think was remarked on the list, the intended meaning of the phrase
> "ASCII-compatible" is not too obvious.
>
> I _think_ this refers to the (often desirable) property of UTF-8 that
> characters from the US-ASCII range are encoded in UTF-8 in a way that is
> byte-for-byte identical to US-ASCII encoding. I think it would be better
> to say that directly, somehow.
>
> For example:
>
> "UTF-8 without a BOM has the property that characters from the US-ASCII
> range are encoded byte-for-byte the same way as by the US-ASCII
> encoding. Adding a BOM inserts additional bytes, so this is no longer
> true."
>
>
>
>

Received on Thursday, 20 December 2012 11:09:20 UTC