W3C home > Mailing lists > Public > ietf-charsets@w3.org > January to March 1999

Re: draft-hoffman-utf16-01.txt available

From: MURATA Makoto <murata@apsdc.ksp.fujixerox.co.jp>
Date: Wed, 03 Feb 1999 14:25:42 +0900
To: ietf-charsets@iana.org
Message-id: <199902030525.AA03382@murata.apsdc.ksp.fujixerox.co.jp>
Francois Yergeau wrote:
> And further, I happen to think that all XML entities (in UTF-16) having a
> BOM is a Good Thing.  The XML spec is designed such that one can always
> determine the character encoding without external info, let's keep it that
> way.

Actually, the charset parameter of text/xml or appliation/xml, if exists, 
is authoritative.  In the case of text/xml, the default is US-ASCII (Jim and 
I were instructed to choose US-ASCII by the IESG, which is aware of the 
inconsistency with HTTP 1.1).  More about this, see RFC2376.

medavis2@us.ibm.com wrote:
> *** Even if XML did not require a BOM, it would not be unambiguous! Look at
> Appendix F in
> http://www.xml.com/axml/target.html#sec-guessing. The file would just have
> to have the initial '<?xml' like all other encodings. To quote:
> "Because each XML entity not in UTF-8 or UTF-16 format must begin with an
> XML encoding declaration, in which the first characters must be '<?xml',
> any conforming processor can detect, after two to four octets of input,
> which of the following cases apply. In reading this list, it may help to
> know that in UCS-4, '<' is "#x0000003C" and '?' is "#x0000003F", and the
> Byte Order Mark required of UTF-16 data streams is "#xFEFF".

UTF-16 XML entities do *not* have to begin with '<?xml'.  Thus, if the BOM 
is made optional, we have a problem when the charset parameter is not 


Fuji Xerox Information Systems
Tel: +81-44-812-7230   Fax: +81-44-812-7231
E-mail: murata@apsdc.ksp.fujixerox.co.jp
Received on Wednesday, 3 February 1999 00:27:38 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:52:16 UTC