Re: UTF-16 (was: Re: Charset reviewer appointed)

At 06:20 PM 7/29/98 +0900, Martin J. Duerst wrote:
>What XML is currently stating is that all UTF-16 documents must start
>with a BOM...

I suspect the XML people are a good indication of what the world
expects from UTF-16 with regard to byte ordering, and that
they would be happy if UTF-16 were defined like this:

"UTF-16 generators SHOULD send in big-endian byte order.
UTF-16 generators that send in big-endian byte order MAY begin 
with the zero width non breaking space (also called Byte Order Mark or BOM) (0xFEFF).
UTF-16 generators that send in little-endian byte order MUST begin 
with the BOM."

which can be summed up as
"UTF-16 defaults to big-endian; an initial BOM can be used
to switch to little-endian."

I also suspect they'd be willing to modify XML's definition to make the
BOM optional for big-endian streams.
- Dan

--Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)

Received on Wednesday, 29 July 1998 08:55:27 UTC