- From: Dan Kegel <dank@alumni.caltech.edu>
- Date: Sun, 24 May 1998 19:12:22 -0700
- To: Harald Alvestrand <Harald.Alvestrand@maxware.no>, Chris Newman <Chris.Newman@INNOSOFT.COM>, "Martin J. Duerst" <duerst@w3.org>
- Cc: MURATA Makoto <murata@apsdc.ksp.fujixerox.co.jp>, ietf-charsets@ISI.EDU, murata@fxis.fujixerox.co.jp, Tatsuo_Kobayashi@justsystem.co.jp
At 11:35 PM 5/24/98 +0200, Harald Alvestrand wrote: >Hmmm.... everyone MUST do A, but if they don't, they MUST.... >Suggested alternative: > > UTF-16 generators MUST send in big-endian byte order. > > NOTE: Some implementations that do not conform to this specification > have occasionally sent data in little-endian byte order. When they do > this, they commonly precede the data with a zero width non breaking > space (also called Byte Order Mark or BOM) (0xFEFF). > Thus, an UTF-16 parser encountering the code 0xFFFE as the first > character of a purported UTF-16 stream may safely assume that he > has encountered a nonconformant data source. > >The info about what is right is there; the info about how to tell if >you encounter someone doing the Wrong Thing is there too. True, but it's a little wishy-washy, in that it doesn't try to lay down the law about how the little-endian holdouts must behave in order to get along peacefully with the rest of us. You need to tell them they have to use a BOM if they're going to talk funny. - Dan --Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)
Received on Monday, 25 May 1998 16:20:46 UTC