Re: UTF-16BL/LE,... (was: Re: I18N issues with the XML Specification

At 05:21 PM 4/12/00 -0700, Tim Bray wrote:
>My frustration is caused by my own lack of penetration: people who are known
>to be smart and understand the issues keep saying things that (to me) seem
>unrealistic and advocating practices that (to me) seem hostile to
>interoperability, and no matter how many times they explain why this is
>good, I can't understand.  So objectively, the problem is likely on my side.

It might have helped your understanding if we had instead called them 
"BLORBMURF-BE" and "BLORBMURF-LE". The whole reason for creating them was 
to increase interoperability for the folks who couldn't, or didn't want to, 
use the BOM in the stuff they were moving around. If that doesn't apply to 
XML (and I don't see any reason why it should), then you can safely ignore 
these charsets. It seems like your problem is that you feel like you can't 
ignore them because their names have "UTF-16" in them. Really: feel free to 
ignore them.

>Maybe -BE and -LE really aren't UTF-16 at all.

They are similar to the UTF-16 charset, but they have different rules. 
UTF-16 is an encoding, not a charset. All three charsets start with the 
UTF-16 transformation format, then add rules to make them charsets.

>   That I can sorta kinda
>believe, if I try really hard, on alternate days of the week.

Today could be one of those days!

>   Maybe there's
>some situation where it's a good idea to create XML in the natural 16-bit
>encoding of Unicode code points without a BOM.  That I can't believe at all.

Me neither. There is no reason to change the rules for untagged XML because 
these two charsets have been created. You wouldn't have considered changing 
the rules if we had named the new charsets "BLORBMURF-BE" and "BLORBMURF-LE".

--Paul Hoffman, Director
--Internet Mail Consortium

Received on Wednesday, 12 April 2000 20:36:45 UTC