W3C home > Mailing lists > Public > xml-editor@w3.org > April to June 2000

Re: UTF-16BL/LE,... (was: Re: I18N issues with the XML Specification

From: Tim Bray <tbray@textuality.com>
Date: Wed, 12 Apr 2000 09:31:02 -0700
Message-Id: <>
To: Dan Connolly <connolly@w3.org>
Cc: "Martin J. Duerst" <duerst@w3.org>, w3c-i18n-ig@w3.org, xml-editor@w3.org, w3c-xml-core-wg@w3.org
At 10:39 AM 4/12/00 -0500, Dan Connolly wrote:
>Is there any reason not to treat UTF-16BE and UTF-16LE just
>like other non-required encodings, ala ISO-8859-1
>and ISO-2022-JP and such? i.e. you can use it, but not
>without an explicit declaration (either in the XML entity
>or in the HTTP headers or filesystem metadata or ...), and beware
>that not all processors are required to read it; you may
>well get a 'sorry, I don't grok that encoding' error.

It all comes down to the interpretation of the term 'UTF-16' in the XML
spec.  If this is interpreted to subsume the LE and BE versions, then
an XML processor would be justified in declaring an error.  Thus, Martin
wants essentially to forbid a processor from applying the spec's rules
on UTF-16 to things that are in -BE and -LE.

Note that the RFC's go further, and *forbid* the use of the BOM in -LE and 

It is my position that this is a mistake.  First, that -LE and -BE are 
really truly UTF-16, and that pretending they're not is first of all just 
incorrect.   Secondly, this is actively harmful in that it encourages people 
to create documents using a format that *forbids* the application of a 
simple low-cost interoperability tool that demonstrably works well across 
networks and implementations.  This is simply wrong. -Tim
Received on Wednesday, 12 April 2000 12:29:58 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:37:39 UTC