Re: UTF-16BL/LE,... (was: Re: I18N issues with the XML Specification

At 00/04/12 16:22 -0700, Tim Bray wrote:

>Pardon my lack of imagination, but I just cannot see how a person or
>committee can say that UTF-16BE stands on its own, and is "separated"
>from UTF-16, with a straight face.

Well, imagine an XML processor (actually, pretty much any XML processor
out there, I guess) that gets text/xml; charset='UTF-16BE'.
Which of the two error messages is that processor most probably giving:
1) Unknown character encoding 'UTF-16BE'
2) Missing BOM in UTF-16 encoding
The first one is straightforward. The second one would require
quite some intelligence, a feature that's not present in nor
expected from XML processors.

Also, imagine UTF-16BE would allow a BOM (an alternative that
was discussed, but rejected because of the double use of the BOM
as a ZWNJ). In such a case, what would you reasonably expect
from XML processors when getting something as text/xml; charset='UTF-16BE',
and with a (correct) BOM at the start:
1) That this of course UTF-16, and therefore every XML processor
    out in the field has to accept and process it.
2) That this is an unknown encoding, and will be rejected.

I agree that it's difficult to imagine that a person or committee
can say that UTF-16BE is something completely different from UTF-16.
But what we are dealing with are not persons or committees, it's
mechanical software. XML processors out in the field will with
a straight face tell you that UTF-16BE is different from UTF-16.
And because we are writing our specs for mechanical software,
and not for intelligent persons and committees, we have to
make ourselves constantly aware of the cases where machines
'think' differently (or nothing at all).


Regards,    Martin.

Received on Wednesday, 12 April 2000 22:51:52 UTC