W3C home > Mailing lists > Public > www-international@w3.org > October to December 2012

Re: Comments on "The byte-order mark (BOM) in HTML"

From: Richard Ishida <ishida@w3.org>
Date: Thu, 20 Dec 2012 12:43:55 +0000
Message-ID: <50D3080B.3080905@w3.org>
To: Norbert Lindenberg <w3@norbertlindenberg.com>
CC: www-international <www-international@w3.org>
Hmm. The reply to your email seems to have not made it out of my drafts 
folder. Trying again...

RI

On 05/12/2012 16:10, Norbert Lindenberg wrote:
> I've looked over:
> http://www.w3.org/International/questions/new/qa-byte-order-mark-new
>
> - The link "Skip to the answer" seems unnecessary since the answer follows immediately.

Removed.

>
> - The legal name of U+FEFF is ZERO WIDTH NO-BREAK SPACE.

Now replaced by BYTE ORDER MARK.

>
> - The paragraph discussing UTF-16 mentions that characters can have 2 or 4 bytes, but the following graphic shows only 2-byte characters.

Correct. I edited the text a little, so that you don't expect to see 4 
byte characters.

>
> - "works in XML and HTML": As stated further down, the new rules requiring to use the BOM first apply only to HTML5 served as HTML. For HTML5 served as XML the XML rules still apply, meaning that an HTTP charset attribute overrides the BOM.

Sure. But this paragraph doesn't state anything about precedence.

Having said that, it appears from my tests that browsers are applying 
the BOM precedence over HTTP to pages served as XML also. See 
http://www.w3.org/International/tests/html-css/character-encoding-xhtml/results-xhtml-encoding#precedence

Actually this seems to be consistent with expectations in the XML spec, 
where it says "If an XML entity is in a file, the Byte-Order Mark and 
encoding declaration are used (if present) to determine the character 
encoding." in the section "Priorities in the Presence of External 
Encoding Information" 
(http://www.w3.org/TR/REC-xml/Overview.html#sec-guessing-with-ext-info).

>
> - "either the browser will continue to treat your content as UTF-8": Don't transcoders replace the BOM with a different byte sequence, either with the equivalent character in the target encoding or a replacement character in that encoding?
 >
> - "no longer ASCII-compatible": What does this mean? Usually when UTF-8 is described as ASCII-compatible it means that all byte values that look like ASCII actually are ASCII, and the BOM doesn't break this rule.
>
> - "The transcoder will typically not remove the byte-order mark": Again, is that really true?
>
> - Does anybody still care about Internet Explorer 5.5?

All now moot, given recent changes.

Thanks,
RI

>
> Norbert
>
>
>
Received on Thursday, 20 December 2012 12:44:28 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 20 December 2012 12:44:29 GMT