- From: <bugzilla@jessica.w3.org>
- Date: Mon, 12 Jul 2010 13:35:58 +0000
- To: public-qt-comments@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=9980 Henry Zongaro <zongaro@ca.ibm.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |zongaro@ca.ibm.com --- Comment #1 from Henry Zongaro <zongaro@ca.ibm.com> 2010-07-12 13:35:57 --- I have been reading what Section 3.10 "Unicode Encoding Schemes" of Unicode 5.2 has to say about UTF-16LE, UTF-16BE and UTF-16 encoding schemes.[1] It turns out that the UTF-16 encoding scheme is not equivalent to simply choosing one of UTF-16LE or UTF-16BE. My understanding, is that the byte order mark is only used at the start of the encoded byte sequence in the UTF-16 encoding scheme, according to Unicode 5.2, not in either UTF-16LE or UTF-16BE. The byte sequence FE FF at the start of a file or what-have-you would be interpreted as a zero-width no-break space in something that was known to be encoded in the UTF-16BE encoding scheme. For UTF-32, Unicode 5.2 says the byte order mark is optional. Changing the default to true could break existing implementations. Changing the default to implementation-defined wouldn't harm existing implementations, but I think it could have a slight impact on interoperability if some implementations chose a default byte-order-mark value of true for UTF-32. As an aside, I know far more about the distinction between Unicode character encoding schemes and Unicode character encoding forms than I did when I woke up this morning. :) [1] http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf#G7404 -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Received on Monday, 12 July 2010 13:36:00 UTC