Re: UTF-16BL/LE,... (was: Re: I18N issues with the XML Specification from Tim Bray on 2000-04-12 (xml-editor@w3.org from April to June 2000)

From: Tim Bray <tbray@textuality.com>
Date: Wed, 12 Apr 2000 16:22:11 -0700
To: Paul Hoffman / IMC <phoffman@imc.org>, John Cowan <cowan@locke.ccil.org>
Cc: duerst@w3.org, w3c-i18n-ig@w3.org, xml-editor@w3.org, w3c-xml-core-wg@w3.org
Message-Id: <3.0.32.20000412162037.014a3880@pop.intergate.ca>

At 02:30 PM 4/12/00 -0700, Paul Hoffman / IMC wrote:

>As co-author of the RFC 2781, I think that anything that says "any flavor 
>or UTF-16" is technically incorrect. The RFC very specifically separates 
>the definition of UTF-16 (section 2, which is a restatement of ISO 10646 
>and Unicode) from the labels "UTF-16" "UTF-16BE" and "UTF-16LE". Each 
>labelled type stands on its own and has a separate defintion.

Pardon my lack of imagination, but I just cannot see how a person or 
committee can say that UTF-16BE stands on its own, and is "separated" 
from UTF-16, with a straight face.   

Consider an author creating an XML document in an editor that happens to
use UTF-16 and thus (correctly) inserts a BOM.  That document then cannot
be transmitted as -BE or -LE, even by software that knows its byte
ordering, because the BOM is forbidden in those variants.  Thus, as
Murata has long (and correctly) stated, the -BE and -LE variants are
simply not appropriate for XML documents. -Tim

Received on Wednesday, 12 April 2000 19:21:53 UTC