- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Wed, 05 Dec 2012 17:56:15 +0100
- To: John Cowan <cowan@mercury.ccil.org>
- Cc: Norbert Lindenberg <w3@norbertlindenberg.com>, www-international <www-international@w3.org>
John Cowan, Wed, 5 Dec 2012 11:31:25 -0500: > Norbert Lindenberg scripsit: > >> - "no longer ASCII-compatible": What does this mean? Usually when UTF-8 >> is described as ASCII-compatible it means that all byte values that >> look like ASCII actually are ASCII, and the BOM doesn't break this rule. > > I take it to mean that UTF-8-encoded text containing only characters from > the ASCII repertoire will will be byte-for-byte the same as if it were > ASCII-encoded text. This is true iff the UTF-8 data doesn't have a BOM. Usually the opposite argument is made, namely that the ASCII repertoire is fully UTF-8-compatible. It would be nice if it was clarified in the text when it is a problem that ASCII + BOM is no longer ASCII. Perhaps it relates to Unix tools? The 'UTF-8 and Unicode FAQ for Unix/Linux' says that BOM: [1] "would break far too many existing ASCII syntax conventions (such as scripts starting with #!)" [1] http://www.cl.cam.ac.uk/~mgk25/unicode.html#linux -- leif halvard silli
Received on Wednesday, 5 December 2012 16:56:43 UTC