W3C home > Mailing lists > Public > ietf-charsets@w3.org > April to June 2001

Re: Registration of new charsets UTF-32, UTF-32BE, UTF32LE

From: Mark Davis <markdavis34@home.com>
Date: Fri, 11 May 2001 10:06:10 -0700
To: ned.freed@mrochek.com
Cc: Harald Tveit Alvestrand <harald@alvestrand.no>, Mark Davis <mark.davis@us.ibm.com>, ned.freed@mrochek.com, ietf-charsets@iana.org
Message-id: <00ab01c0da3c$ad2eeb00$0c680b41@c1340594a>
The UTC does not consider it a botch. It is permissible for UTF-16 to be
either BE or LE if it does not have a BOM. For example, on Windows a
database field might contain BOM-less LE text. In such a case, a
higher-level protocol is establishing the byte orientation.

I personally agree that it would be far better to use the unambiguous term
UTF-16LE for BOM-less UTF-16 text for such serializations. I think the
problem is that the term "UTF-16" can also refer to the in-memory use of
UTF-16 as a sequence of 16-bit chunks, where byte-orientation is not an
issue (unless you are skanky and use unions to look at the bytes). So people
like to simply refer to it as UTF-16 whether in memory or serialized,
instead of the more appropriate term UTF-16LE.

However, if the IETF liaison wants to present a proposal to restrict UTF-16
and UTF-32 -- when used as a serialization into bytes, to being only BE if
there is no BOM, I believe that the UTC would certainly take that into
consideration. The next meeting is happening very soon...

Mark
----- Original Message -----
From: <ned.freed@mrochek.com>
To: "Mark Davis" <markdavis34@home.com>
Cc: "Harald Tveit Alvestrand" <harald@alvestrand.no>; "Mark Davis"
<mark.davis@us.ibm.com>; <ned.freed@mrochek.com>; <ietf-charsets@iana.org>
Sent: Friday, May 11, 2001 09:13
Subject: Re: Registration of new charsets UTF-32, UTF-32BE, UTF32LE


> > Thanks for your feedback. I will resubmit them.
>
> > Comments:
>
> > A. If each charset needs to be in a separate message, then you really
ought
> > to fix http://www.normos.org/ietf/bcp/bcp19.txt. It says:
>
> > "5.  Charset Registration Template
>
> >      To: ietf-charsets@iana.org
> >      Subject: Registration of new charset [names]"
>
> > with the word "names" in plural. This is misleading.
>
> The rest of the regisration form clearly talks about a single charset with
> multiple names, so I'm not sure I buy your reasoning here. However, since
we
> want to discourage the use of any aliases, I have no problem with changing
it
> to "name" singular.
>
> > B. UTF-32 in Unicode, as with UTF-16, could be BOM-less, with the
> > orientation being determined by a higher-level protocol. The IETF
> > registration (with good reason!) can impose a further restriction, as it
> > does with UTF-16, that BOM-less UTF-16 must be BE. I will put such a
clause
> > in the registration.
>
> Which means the UTC has apparently learned nothing from the UTF-16
disaster.
> If we push back on this is there any hope of getting this botch fixed?
>
> Ned
>
Received on Friday, 11 May 2001 13:06:25 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 5 June 2006 15:10:51 GMT