- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Thu, 22 Nov 2012 00:16:45 +0100
- To: John Cowan <cowan@mercury.ccil.org>
- Cc: Anne van Kesteren <annevk@annevk.nl>, www-international@w3.org
John Cowan, Wed, 21 Nov 2012 16:27:27 -0500: > Per Unicode, in UTF-16LE and UTF-16BE documents, there is no such > thing as a BOM. If a UTF-16LE document begins FF FE, that means the > first character is U+FEFF, ZERO BASED NON-BREAKING SPACE; likewise if > a UTF-16BE document begins FE FF. > > In the UTF-16 encoding, a leading FF FE or FE FF is a BOM rather than a > character, and all following pairs of bytes are interpreted little-endian > or big-endian respectively. If the first two bytes are neither of these, > a higher-level protocol must decide whether to interpret the pairs of > bytes as big- or little-endian. If no higher-level protocol exists, > the interpretation is big-endian by default. UTF-16LE and UTF-16BE theoretical ability to let a leading FF FE or FE FF represent a ZERO WIDTH NO-BREAK SPACE rather than a BOM, seems to be without value for mark-up languages. The only exception I can think of would be if was defined a markup language where the role of the '<' character (in XML) was replaced with the the very ZERO WIDTH NO-BREAK SPACE character. Hence, it doesn't seem important that e.g. XML editors or XML parsers are able to handle UTF-16LE or UTF-16BE correctly with regard to whether FF FE or FE FF – as the first two bytes – represents a BOM or a ZERO WITH NO-BREAK SPACE. In fact, it seems better if they do not treat them like that as this removes at least one possible (fatal) error opportunity. For that reason, it seems entirely OK that Firefox will, when version 19 is released, treat a leading FF FE or FE FF as a BOM, even in XML documents. [1] (Can be tested e.g. in FirefoxNightly.) Thus Firefox aligns it XML and HTML parsing in this detail. And other browsers, at least Webkit, has done long ago. Though, I should add, that Firefox 19 and Webkit also treat plain txt the same way. [1] https://bugzilla.mozilla.org/show_bug.cgi?id=716579#c14 -- leif halvard silli
Received on Wednesday, 21 November 2012 23:17:16 UTC