Re: UTF-16BL/LE,... (was: Re: I18N issues with the XML Specification from Tim Bray on 2000-04-12 (xml-editor@w3.org from April to June 2000)

From: Tim Bray <tbray@textuality.com>
Date: Wed, 12 Apr 2000 09:31:02 -0700
To: Dan Connolly <connolly@w3.org>
Cc: "Martin J. Duerst" <duerst@w3.org>, w3c-i18n-ig@w3.org, xml-editor@w3.org, w3c-xml-core-wg@w3.org
Message-Id: <3.0.32.20000412093051.0200b540@pop.intergate.ca>

At 10:39 AM 4/12/00 -0500, Dan Connolly wrote:
>Is there any reason not to treat UTF-16BE and UTF-16LE just
>like other non-required encodings, ala ISO-8859-1
>and ISO-2022-JP and such? i.e. you can use it, but not
>without an explicit declaration (either in the XML entity
>or in the HTTP headers or filesystem metadata or ...), and beware
>that not all processors are required to read it; you may
>well get a 'sorry, I don't grok that encoding' error.

It all comes down to the interpretation of the term 'UTF-16' in the XML
spec.  If this is interpreted to subsume the LE and BE versions, then
an XML processor would be justified in declaring an error.  Thus, Martin
wants essentially to forbid a processor from applying the spec's rules
on UTF-16 to things that are in -BE and -LE.

Note that the RFC's go further, and *forbid* the use of the BOM in -LE and 
-BE.

It is my position that this is a mistake.  First, that -LE and -BE are 
really truly UTF-16, and that pretending they're not is first of all just 
incorrect.   Secondly, this is actively harmful in that it encourages people 
to create documents using a format that *forbids* the application of a 
simple low-cost interoperability tool that demonstrably works well across 
networks and implementations.  This is simply wrong. -Tim

Received on Wednesday, 12 April 2000 12:29:58 UTC