- From: Harald Alvestrand <Harald.Alvestrand@maxware.no>
- Date: Sun, 24 May 1998 23:35:14 +0200
- To: Dan Kegel <dank@alumni.caltech.edu>, Chris Newman <Chris.Newman@INNOSOFT.COM>, "Martin J. Duerst" <duerst@w3.org>
- Cc: MURATA Makoto <murata@apsdc.ksp.fujixerox.co.jp>, ietf-charsets@ISI.EDU, murata@fxis.fujixerox.co.jp, Tatsuo_Kobayashi@justsystem.co.jp
At 13:56 24.05.98 -0700, Dan Kegel wrote: >Perhaps a middle ground, here? How about this (suitably reworded): > UTF-16 generators SHOULD [MUST?] NOT send in little-endian byte order, but > if they do, they MUST prefix the stream with a little-endian BOM. > UTF-16 consumers MUST assume the default byte-order is big-endian, > but MUST also accept little-endian if prefixed with a little-endian BOM. > >That way, big-endian is preferred, yet interoperability is preserved. Hmmm.... everyone MUST do A, but if they don't, they MUST.... Suggested alternative: UTF-16 generators MUST send in big-endian byte order. NOTE: Some implementations that do not conform to this specification have occasionally sent data in little-endian byte order. When they do this, they commonly precede the data with a zero width non breaking space (also called Byte Order Mark or BOM) (0xFEFF). Thus, an UTF-16 parser encountering the code 0xFFFE as the first character of a purported UTF-16 stream may safely assume that he has encountered a nonconformant data source. The info about what is right is there; the info about how to tell if you encounter someone doing the Wrong Thing is there too. Harald A -- Harald Tveit Alvestrand, Maxware, Norway Harald.Alvestrand@maxware.no --Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)
Received on Sunday, 24 May 1998 14:38:27 UTC