Comments on "The byte-order mark (BOM) in HTML" from Norbert Lindenberg on 2012-12-05 (www-international@w3.org from October to December 2012)

From: Norbert Lindenberg <w3@norbertlindenberg.com>
Date: Wed, 5 Dec 2012 08:10:47 -0800
To: www-international <www-international@w3.org>
Cc: Norbert Lindenberg <w3@norbertlindenberg.com>
Message-Id: <051BAC25-4495-4C4C-9F78-8970A60BF7D5@norbertlindenberg.com>

I've looked over:
http://www.w3.org/International/questions/new/qa-byte-order-mark-new

- The link "Skip to the answer" seems unnecessary since the answer follows immediately.

- The legal name of U+FEFF is ZERO WIDTH NO-BREAK SPACE.

- The paragraph discussing UTF-16 mentions that characters can have 2 or 4 bytes, but the following graphic shows only 2-byte characters.

- "works in XML and HTML": As stated further down, the new rules requiring to use the BOM first apply only to HTML5 served as HTML. For HTML5 served as XML the XML rules still apply, meaning that an HTTP charset attribute overrides the BOM.

- "either the browser will continue to treat your content as UTF-8": Don't transcoders replace the BOM with a different byte sequence, either with the equivalent character in the target encoding or a replacement character in that encoding?

- "no longer ASCII-compatible": What does this mean? Usually when UTF-8 is described as ASCII-compatible it means that all byte values that look like ASCII actually are ASCII, and the BOM doesn't break this rule.

- "The transcoder will typically not remove the byte-order mark": Again, is that really true?

- Does anybody still care about Internet Explorer 5.5?

Norbert

Received on Wednesday, 5 December 2012 16:11:29 UTC