- From: Sujatha N. Marsden <smarsden@etranslate.com>
- Date: Wed, 12 Apr 2000 14:55:45 -0700
- To: w3c-i18n-ig@w3.org
- Cc: xml-editor@w3.org, w3c-xml-core-wg@w3.org
>For the record, and this will come as no surprise, I totally oppose this >change, because I do *not* think 16LE and 16BE are appropriate for use with >XML, as they fly in the face of XML's orientation towards interoperability >across heterogeneous systems. I think XML entities encoded in any flavor >of UTF-16 should always have a BOM; exactly what the current spec [correctly >IMHO] says. Should it be considered an error if it doesn't contain a BOM? IMHO, in the absence of a BOM, UTF-16BE should be assumed. If the charset declaration and BOM disagree, it is a fatal error. In case of UTF-16 declaration, the BOM determines which one of UTF-16LE or UTF-16BE it is. Including these names (UTF-16LE & UTF-16BE) in the charset name possibility just adds more wrinkles and probably more confusions and definitely more errors. I thoroughly disapprove of the LE and BE suffixes. RFC2781 makes it an error to have a BOM in case of UTF-16LE or UTF-16BE charset declaration. Why should it be such especially if there is no contradiction? RFC2781 also says: "Text labelled "UTF-16LE" can always be interpreted as being little- endian. The detection of an initial BOM does not affect de- serialization of text labelled as UTF-16LE. Finding 0xFE followed by 0xFF is an error since there is no Unicode character 0xFFFE, which would be the interpretation of those octets under little-endian order." Well, FEFF is not being interpreted as a character but as a mark which is very different. But interestingly enough, FEFF is allowed in case UTF-16 is the charset declaration. Sujatha. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Sujatha N. Marsden Chief Scientist eTranslate, Inc.
Received on Wednesday, 12 April 2000 17:52:35 UTC