Re: UTF-16BL/LE,... (was: Re: I18N issues with the XML Specification

At 02:30 PM 4/12/00 -0700, Paul Hoffman / IMC wrote:

>As co-author of the RFC 2781, I think that anything that says "any flavor 
>or UTF-16" is technically incorrect. The RFC very specifically separates 
>the definition of UTF-16 (section 2, which is a restatement of ISO 10646 
>and Unicode) from the labels "UTF-16" "UTF-16BE" and "UTF-16LE". Each 
>labelled type stands on its own and has a separate defintion.

Pardon my lack of imagination, but I just cannot see how a person or 
committee can say that UTF-16BE stands on its own, and is "separated" 
from UTF-16, with a straight face.   

Consider an author creating an XML document in an editor that happens to
use UTF-16 and thus (correctly) inserts a BOM.  That document then cannot
be transmitted as -BE or -LE, even by software that knows its byte
ordering, because the BOM is forbidden in those variants.  Thus, as
Murata has long (and correctly) stated, the -BE and -LE variants are
simply not appropriate for XML documents. -Tim

Received on Wednesday, 12 April 2000 19:21:53 UTC