RE: draft-hoffman-utf16-01.txt available from Francois Yergeau on 1999-02-02 (ietf-charsets@w3.org from January to March 1999)

From: Francois Yergeau <yergeau@alis.com>
Date: Tue, 02 Feb 1999 15:34:14 -0500
To: Larry Masinter <masinter@parc.xerox.com>
Cc: "Martin J. Duerst" <duerst@w3.org>, Paul Hoffman / IMC <phoffman@imc.org>, MURATA Makoto <murata@apsdc.ksp.fujixerox.co.jp>, ietf-charsets@iana.org
Message-id: <3.0.5.32.19990202153414.009b0010@www.alis.com>

À 12:10 02/02/99 -0800, Larry Masinter a écrit :
>I think this is the only position consistent with having
>three different charset registrations: "BOM should not
>be sent with UTF-16BE or UTF-16LE, only with UTF-16."

Labelling UTF-16BE (or LE) and then sending a BOM is not inconsistent, it's
only redundant.

And this redundance can be useful.  The explicit label lets the recipient
of a MIME object know the endianness without looking inside, which is good.
 But if the object is then moved elsewhere by a non-MIME protocol (FTP,
disk copy, etc.), there is a BOM that the recipient can look at.

Since the problem with BOMs is their ambiguousness -- is it a real BOM or
an intended ZWNBSP? -- I currently lean toward a "SHOULD NOT put a BOM"
unless it's mandatory (such as in XML), in which case it is also unambiguous.

Martin Dürst:
>> We wouldn't have to change XML, only to add a clarification to
>> say that "UTF-16" in the XML spec means only the case
>> charset="UTF-16", and not the others.

That doesn't work.  The producer of an XML entity is not necessarily the
MIME processor that will tag it, and may not know whether the entity will
be tagged UTF-16 or UTF16(BE|LE).  Does it put a BOM?

And further, I happen to think that all XML entities (in UTF-16) having a
BOM is a Good Thing.  The XML spec is designed such that one can always
determine the character encoding without external info, let's keep it that
way.


-- 
François

Received on Tuesday, 2 February 1999 15:39:08 UTC