- From: Dan Kegel <dank@alumni.caltech.edu>
- Date: Sun, 31 May 1998 08:01:53 -0700
- To: MURATA Makoto <murata@apsdc.ksp.fujixerox.co.jp>, Harald Alvestrand <Harald.Alvestrand@maxware.no>, Chris Newman <Chris.Newman@INNOSOFT.COM>, "Martin J. Duerst" <duerst@w3.org>, ietf-charsets@ISI.EDU
- Cc: murata@fxis.fujixerox.co.jp, Tatsuo_Kobayashi@justsystem.co.jp
At 07:51 PM 5/31/98 +0900, MURATA Makoto wrote: >I think we are converging but minor differences exist. Little endian: >should not or must not? Is the BOM mandatory or recommended? >... >3. My proposal > >I would like to reduce useless options. Little endian is fine, but it >should be used only in local environments. UTF-16 without the BOM is fine, >but thee should be used only in local evrionments. > >Here is my proposal. > > UTF-16 generators MUST send in big-endian byte order and must begin with the > zero width non breaking space (also called Byte Order Mark or BOM) (0xFEFF). > > NOTE: Some implementations that do not conform to this specification > have occasionally sent data in little-endian byte order. When they do > this, they commonly precede the data with the BOM. > Thus, an UTF-16 parser encountering the code 0xFFFE as the first > character of a purported UTF-16 stream may safely assume that he > has encountered a nonconformant data source. If the BOM is absent, > there is no way to 100% reliably detect little-endian data that does not > use the BOM. I like this language! There was one other issue raised: for protocols that send many small text messages, should the BOM be sent in each string? Examples given were HTTP headers and database protocols. In the case of HTTP headers, we can probably consider the entire HTTP header stream as a single message, and only require the BOM at the beginning of the stream, e.g. the client and server would each send the BOM as the first two bytes after opening the socket. In the case of database protocols, which send many short strings, we might want to leave it up to the protocol spec to say whether the byte order is specified globally or included in each text string. Examples and suggestions like the two above should probably be included in the proposal. - Dan --Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)
Received on Sunday, 31 May 1998 08:07:12 UTC