- From: Tim Bray <tbray@textuality.com>
- Date: Wed, 05 Apr 2000 13:31:44 -0700
- To: John Cowan <jcowan@reutershealth.com>, Rick Jelliffe <ricko@gate.sinica.edu.tw>
- Cc: MURATA Makoto <muraw3c@attglobal.net>, xml-editor@w3.org, w3c-i18n-ig@w3.org, w3c-xml-core-wg@w3.org
At 04:19 PM 4/5/00 -0400, John Cowan wrote: >It all depends on the interpretation of the term "UTF-16" in clause 2.3.3: > ># Entities encoded in UTF-16 must begin with the Byte Order Mark [...]. > >The issue is whether "UTF-16" means only the charset so named in RFC 2871, >or in the XML Rec context it is a generic term covering all three charsets >named there. Exactly... the truth of course is that at the time of drafting of XML 1.0, there was only one UTF-16. It seems to me that the only sensible reading of 2.3.3 comprises all members of the UTF-16 family. But I acknowledge there are others who differ, and as John points out, there are people using BOM-less UTF-16, presumably in a highly constrained environment where they control both ends of the pipe. >I myself agree with you: UTF-16BE and UTF-16LE should be supported if the >appropriate encoding declaration is present. I disagree. I think that unless you're working in the type of highly constrained environment I describe above, it is rather irresponsible to create an XML document in UTF-16 without a BOM; the cost is very low and the interoperability benefits quite substantial. XML's design is totally oriented to successful interoperation in heterogeneous environments. Thus, data formats that forbid the use of proven low-cost interoperability aids simply should not be considered for use by responsible creators of XML, and we should not do anything in our specs to encourage such behavior. -Tim
Received on Wednesday, 5 April 2000 16:34:18 UTC