Re: For review: The byte-order mark (BOM) in HTML from Leif Halvard Silli on 2012-12-19 (www-international@w3.org from October to December 2012)

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Wed, 19 Dec 2012 22:24:45 +0100
To: Albert Lunde <atlunde@panix.com>
Cc: www International <www-international@w3.org>
Message-id: <20121219222445303220.6bbeac45@xn--mlform-iua.no>

Albert Lunde, Tue, 18 Dec 2012 12:39:43 -0600:
>  "You should also be aware that, although ASCII is a subset of UTF-8, 
> a file that starts with a BOM is no longer ASCII-compatible."
> 
> As I think was remarked on the list, the intended meaning of the 
> phrase "ASCII-compatible" is not too obvious.

+1

> I _think_ this refers to the (often desirable) property of UTF-8 that 
> characters from the US-ASCII range are encoded in UTF-8 in a way that 
> is byte-for-byte identical to US-ASCII encoding. I think it would be 
> better to say that directly, somehow.
> 
> For example:
> 
> "UTF-8 without a BOM has the property that characters from the 
> US-ASCII range are encoded byte-for-byte the same way as by the 
> US-ASCII encoding. Adding a BOM inserts additional bytes, so this is 
> no longer true."

-1

The "characters from the US-ASCII range are encoded byte-for-byte the 
same way" even if you add the BOM. So this doesn’t sound like any 
improvement.

It seems impossible to improve the text unless Richard clarifies what 
use the text has in mind. When and why would it matter that a text that 
appears to be ASCII fails to be ASCII due to the BOM?
-- 
leif halvard silli

Received on Wednesday, 19 December 2012 21:25:10 UTC