Re: Comments on "The byte-order mark (BOM) in HTML"

John Cowan, Wed, 5 Dec 2012 11:31:25 -0500:
> Norbert Lindenberg scripsit:
> 
>> - "no longer ASCII-compatible": What does this mean? Usually when UTF-8
>> is described as ASCII-compatible it means that all byte values that
>> look like ASCII actually are ASCII, and the BOM doesn't break this rule.
> 
> I take it to mean that UTF-8-encoded text containing only characters from
> the ASCII repertoire will will be byte-for-byte the same as if it were
> ASCII-encoded text.  This is true iff the UTF-8 data doesn't have a BOM.

Usually the opposite argument is made, namely that the ASCII repertoire 
is fully UTF-8-compatible. It would be nice if it was clarified in the 
text when it is a problem that ASCII + BOM is no longer ASCII. Perhaps 
it relates to Unix tools? The 'UTF-8 and Unicode FAQ for Unix/Linux' 
says that BOM: [1]  "would break far too many existing ASCII syntax 
conventions (such as scripts starting with #!)"

[1] http://www.cl.cam.ac.uk/~mgk25/unicode.html#linux
-- 
leif halvard silli

Received on Wednesday, 5 December 2012 16:56:43 UTC