Re: I18N issues with the XML Specification

At 04:19 PM 4/5/00 -0400, John Cowan wrote:
>It all depends on the interpretation of the term "UTF-16" in clause 2.3.3:
>
># Entities encoded in UTF-16 must begin with the Byte Order Mark [...].
>
>The issue is whether "UTF-16" means only the charset so named in RFC 2871,
>or in the XML Rec context it is a generic term covering all three charsets
>named there.

Exactly... the truth of course is that at the time of drafting of XML 1.0,
there was only one UTF-16.  It seems to me that the only sensible reading 
of 2.3.3 comprises all members of the UTF-16 family.  But I acknowledge 
there are others who differ, and as John points out, there are people using 
BOM-less UTF-16, presumably in a highly constrained environment where they 
control both ends of the pipe.

>I myself agree with you: UTF-16BE and UTF-16LE should be supported if the
>appropriate encoding declaration is present.

I disagree.  I think that unless you're working in the type of highly
constrained environment I describe above, it is rather irresponsible to
create an XML document in UTF-16 without a BOM; the cost is very low
and the interoperability benefits quite substantial.  XML's design
is totally oriented to successful interoperation in heterogeneous
environments.  Thus, data formats that forbid the use of proven
low-cost interoperability aids simply should not be considered for use 
by responsible creators of XML, and we should not do anything in our
specs to encourage such behavior. -Tim

Received on Wednesday, 5 April 2000 16:34:18 UTC