Re: For review: The byte-order mark (BOM) in HTML from Martin J. Dürst on 2012-12-20 (www-international@w3.org from October to December 2012)

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Thu, 20 Dec 2012 20:11:35 +0900
To: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
CC: Albert Lunde <atlunde@panix.com>, www International <www-international@w3.org>
Message-ID: <50D2F267.3030908@it.aoyama.ac.jp>

On 2012/12/20 7:48, Leif Halvard Silli wrote:
> Albert Lunde, Wed, 19 Dec 2012 16:33:44 -0600:
>> On 12/19/2012 3:24 PM, Leif Halvard Silli wrote:
>>> The "characters from the US-ASCII range are encoded byte-for-byte the
>>> same way" even if you add the BOM. So this doesn’t sound like any
>>> improvement.
>>
>> The individual characters are encoded the same, but the whole encoded
>> byte sequence is different. I'd agree it's hard to say this clearly,
>> especially for an audience that's new to these ideas.
>>
>> This is one of the things that breaks old tools expecting US-ASCII.
>
> Those tools are not broken by ÆØÅ or äöë too, right?

Not necessarily. In a programming language, for example, there are 
places such as string constants where 8-bit data is okay (the compiler 
or interpreter may not really do much with it). But that's not the case 
at the start of a file.

Regards,    Martin.

Received on Thursday, 20 December 2012 11:12:07 UTC