- From: MURATA Makoto <murata@apsdc.ksp.fujixerox.co.jp>
- Date: Thu, 23 Jul 1998 15:57:45 +0900
- To: ietf-charsets@iana.org
- Cc: Tatsuo_Kobayashi@justsystem.co.jp, murata@apsdc.ksp.fujixerox.co.jp, hiura@bakabon.Eng.Sun.COM
Could you tell me if UTF-16 is accepted or not? RFC2376 (text/xml and application/xml) already mentions UTF-16 and the BOM. I am afraid that confusion and incompatibility problems will arise unless we register UTF-16 in IANA very soon. MURATA Makoto wrote: > > -------------------------------------------------------------------- > We propose to register UTF-16 as a charset in IANA. > > UTF-16 generators MUST send in big-endian byte order and MUST > begin with the zero width non breaking space (also called Byte > Order Mark or BOM) (0xFEFF). > > NOTE: Some implementations that do not conform to this > specification have occasionally sent data in little-endian byte > order. When they do this, they commonly precede the data with the > BOM. Thus, a UTF-16 parser encountering the code 0xFFFE as the > first character of a purported UTF-16 stream may safely assume > that he has encountered a nonconformant data source. There is no > way to 100% reliably detect little-endian data that does not use > the BOM. > > This character set is not permitted for use with MIME text/* media > types. However, the MIME-like mechanism of HTTP may use this > character set for text/*, since this mechanism is exempt from the > restrictions on the text top-level type (see section 19.4.1 of > HTTP 1.1 [RFC-2068]). > > [RFC-2068] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, > T. Berners-Lee. "Hypertext Transfer Protocol -- HTTP/1.1" > UC Irvine, DEC, MIT/LCS. RFC 2068. January, 1997. > > Charset name(s): UTF-16 > > Published specification(s): > > UTF-16 as a Character Encoding Scheme is defined in Appendix C.3 > of [UNICODE] and Amendment 1 of [ISO-10646]. > > The Coded Character Set that UTF-16 refers to is the same version > of ISO/IEC 10646-1 and Unicode that the charset "UTF-8" refers to. > > [ISO-10646] ISO/IEC, Information Technology - Universal > Multiple-Octet Coded Character Set (UCS) - Part 1: Architecture > and Basic Multilingual Plane, May 1993. > > [UNICODE] The Unicode Consortium, "The Unicode Standard -- Version 2.0", > Addison-Wesley, 1996. > > [RFC-2279] F. Yergeau, "UTF-8, a transformation format of ISO 10646", > January 1998. > > Person & email address to contact for further information: > > Tatsuo L. Kobayashi > Digital Culture Research Center, JUSTSYSTEM Corp. > Email: Tatsuo_Kobayashi@justsystem.co.jp > > Murata Makoto (Family Given) > Fuji Xerox Information Systems, > KSP 9A7, 2-1 Sakado 3-chome, > Takatsu-ku, Kawasaki-shi, > 213 Japan > Email: murata@fxis.fujixerox.co.jp Makoto Fuji Xerox Information Systems Tel: +81-44-812-7230 Fax: +81-44-812-7231 E-mail: murata@apsdc.ksp.fujixerox.co.jp --Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)
Received on Wednesday, 22 July 1998 23:56:33 UTC