- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Wed, 19 Dec 2012 22:24:45 +0100
- To: Albert Lunde <atlunde@panix.com>
- Cc: www International <www-international@w3.org>
Albert Lunde, Tue, 18 Dec 2012 12:39:43 -0600: > "You should also be aware that, although ASCII is a subset of UTF-8, > a file that starts with a BOM is no longer ASCII-compatible." > > As I think was remarked on the list, the intended meaning of the > phrase "ASCII-compatible" is not too obvious. +1 > I _think_ this refers to the (often desirable) property of UTF-8 that > characters from the US-ASCII range are encoded in UTF-8 in a way that > is byte-for-byte identical to US-ASCII encoding. I think it would be > better to say that directly, somehow. > > For example: > > "UTF-8 without a BOM has the property that characters from the > US-ASCII range are encoded byte-for-byte the same way as by the > US-ASCII encoding. Adding a BOM inserts additional bytes, so this is > no longer true." -1 The "characters from the US-ASCII range are encoded byte-for-byte the same way" even if you add the BOM. So this doesn’t sound like any improvement. It seems impossible to improve the text unless Richard clarifies what use the text has in mind. When and why would it matter that a text that appears to be ASCII fails to be ASCII due to the BOM? -- leif halvard silli
Received on Wednesday, 19 December 2012 21:25:10 UTC