- From: <bugzilla@jessica.w3.org>
- Date: Mon, 12 Jul 2010 13:35:58 +0000
- To: public-qt-comments@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=9980
Henry Zongaro <zongaro@ca.ibm.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |zongaro@ca.ibm.com
--- Comment #1 from Henry Zongaro <zongaro@ca.ibm.com> 2010-07-12 13:35:57 ---
I have been reading what Section 3.10 "Unicode Encoding Schemes" of Unicode 5.2
has to say about UTF-16LE, UTF-16BE and UTF-16 encoding schemes.[1] It turns
out that the UTF-16 encoding scheme is not equivalent to simply choosing one of
UTF-16LE or UTF-16BE. My understanding, is that the byte order mark is only
used at the start of the encoded byte sequence in the UTF-16 encoding scheme,
according to Unicode 5.2, not in either UTF-16LE or UTF-16BE. The byte
sequence FE FF at the start of a file or what-have-you would be interpreted as
a zero-width no-break space in something that was known to be encoded in the
UTF-16BE encoding scheme.
For UTF-32, Unicode 5.2 says the byte order mark is optional. Changing the
default to true could break existing implementations. Changing the default to
implementation-defined wouldn't harm existing implementations, but I think it
could have a slight impact on interoperability if some implementations chose a
default byte-order-mark value of true for UTF-32.
As an aside, I know far more about the distinction between Unicode character
encoding schemes and Unicode character encoding forms than I did when I woke up
this morning. :)
[1] http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf#G7404
--
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Monday, 12 July 2010 13:36:00 UTC