Re: For review: The byte-order mark (BOM) in HTML from Leif Halvard Silli on 2012-12-19 (www-international@w3.org from October to December 2012)

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Wed, 19 Dec 2012 23:48:24 +0100
To: Albert Lunde <atlunde@panix.com>
Cc: www International <www-international@w3.org>
Message-id: <20121219234824437694.202ec5f7@xn--mlform-iua.no>

Albert Lunde, Wed, 19 Dec 2012 16:33:44 -0600:
> On 12/19/2012 3:24 PM, Leif Halvard Silli wrote:
>> The "characters from the US-ASCII range are encoded byte-for-byte the
>> same way" even if you add the BOM. So this doesn’t sound like any
>> improvement.
> 
> The individual characters are encoded the same, but the whole encoded 
> byte sequence is different. I'd agree it's hard to say this clearly, 
> especially for an audience that's new to these ideas.
> 
> This is one of the things that breaks old tools expecting US-ASCII.

Those tools are not broken by ÆØÅ or äöë too, right? So, if the purpose 
of the text is to explain that the BOM breaks old tools that don’t 
expect the BOM, then it should say that rather than mixing ASCII into 
the argument.

Btw, I think that some use "ASCII" as synonym for "pure/raw/plain 
text". And, seen from that angle, then we are exactly in the problem 
you describe, I think, since the BOM is some kind of "non-text 
signature" in the beginning of the text.
-- 
leif halvard silli

Received on Wednesday, 19 December 2012 22:48:54 UTC