[Bug 9980] [XSLT] Default value for byte-order-mark in xsl:output

http://www.w3.org/Bugs/Public/show_bug.cgi?id=9980


Henry Zongaro <zongaro@ca.ibm.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |zongaro@ca.ibm.com




--- Comment #1 from Henry Zongaro <zongaro@ca.ibm.com>  2010-07-12 13:35:57 ---
I have been reading what Section 3.10 "Unicode Encoding Schemes" of Unicode 5.2
has to say about UTF-16LE, UTF-16BE and UTF-16 encoding schemes.[1]  It turns
out that the UTF-16 encoding scheme is not equivalent to simply choosing one of
UTF-16LE or UTF-16BE.  My understanding, is that the byte order mark is only
used at the start of the encoded byte sequence in the UTF-16 encoding scheme,
according to Unicode 5.2, not in either UTF-16LE or UTF-16BE.  The byte
sequence FE FF at the start of a file or what-have-you would be interpreted as
a zero-width no-break space in something that was known to be encoded in the
UTF-16BE encoding scheme.

For UTF-32, Unicode 5.2 says the byte order mark is optional.  Changing the
default to true could break existing implementations.  Changing the default to
implementation-defined wouldn't harm existing implementations, but I think it
could have a slight impact on interoperability if some implementations chose a
default byte-order-mark value of true for UTF-32.

As an aside, I know far more about the distinction between Unicode character
encoding schemes and Unicode character encoding forms than I did when I woke up
this morning.  :)

[1] http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf#G7404

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Monday, 12 July 2010 13:36:00 UTC