Re: I18N issues with the XML Specification

On Wed, 5 Apr 2000, Tim Bray wrote:

> At 04:19 PM 4/5/00 -0400, John Cowan wrote:
> >It all depends on the interpretation of the term "UTF-16" in clause 2.3.3:
> >
> ># Entities encoded in UTF-16 must begin with the Byte Order Mark [...].
> >
> >The issue is whether "UTF-16" means only the charset so named in RFC 2871,
> >or in the XML Rec context it is a generic term covering all three charsets
> >named there.
> 
> Exactly... the truth of course is that at the time of drafting of XML 1.0,
> there was only one UTF-16. 

I don't follow that: the BOM predates XML; and Unicode existed  --as
little-endian and big-endian-- before XML. I never thought that UTF-16 in
the XML spec specified a particular endianness, otherwise I would have
requested different terminology at the time the spec was being made.

(At the time of the spec there were more 68xxx computers than now, so I
think it is true that the need for both endiannesses has dimished.
I would be interested in knowing how Apple handled endianness on 
the PowerPC, which has selectable endiannss, I am told.)

>  It seems to me that the only sensible reading 
> of 2.3.3 comprises all members of the UTF-16 family. 

I agree.

> >I myself agree with you: UTF-16BE and UTF-16LE should be supported if the
> >appropriate encoding declaration is present.
> 
> I disagree. 

Maybe we can distinguish why that RFC does not apply to XML: because it
is not used purely for labelling but for detection. 

In any case, surely it is not up to an RFC to tell us what we can put in
our entities! ultra vires..  

Rick Jelliffe

Received on Wednesday, 5 April 2000 19:26:31 UTC