- From: MURATA Makoto <murata@apsdc.ksp.fujixerox.co.jp>
- Date: Wed, 03 Feb 1999 14:25:42 +0900
- To: ietf-charsets@iana.org
Francois Yergeau wrote: > > And further, I happen to think that all XML entities (in UTF-16) having a > BOM is a Good Thing. The XML spec is designed such that one can always > determine the character encoding without external info, let's keep it that > way. Actually, the charset parameter of text/xml or appliation/xml, if exists, is authoritative. In the case of text/xml, the default is US-ASCII (Jim and I were instructed to choose US-ASCII by the IESG, which is aware of the inconsistency with HTTP 1.1). More about this, see RFC2376. medavis2@us.ibm.com wrote: > *** Even if XML did not require a BOM, it would not be unambiguous! Look at > Appendix F in > http://www.xml.com/axml/target.html#sec-guessing. The file would just have > to have the initial '<?xml' like all other encodings. To quote: > > "Because each XML entity not in UTF-8 or UTF-16 format must begin with an > XML encoding declaration, in which the first characters must be '<?xml', > any conforming processor can detect, after two to four octets of input, > which of the following cases apply. In reading this list, it may help to > know that in UCS-4, '<' is "#x0000003C" and '?' is "#x0000003F", and the > Byte Order Mark required of UTF-16 data streams is "#xFEFF". UTF-16 XML entities do *not* have to begin with '<?xml'. Thus, if the BOM is made optional, we have a problem when the charset parameter is not available. Cheers, Makoto Fuji Xerox Information Systems Tel: +81-44-812-7230 Fax: +81-44-812-7231 E-mail: murata@apsdc.ksp.fujixerox.co.jp
Received on Wednesday, 3 February 1999 00:27:38 UTC