- From: Mark Davis <markdavis34@home.com>
- Date: Fri, 11 May 2001 10:06:10 -0700
- To: ned.freed@mrochek.com
- Cc: Harald Tveit Alvestrand <harald@alvestrand.no>, Mark Davis <mark.davis@us.ibm.com>, ned.freed@mrochek.com, ietf-charsets@iana.org
The UTC does not consider it a botch. It is permissible for UTF-16 to be either BE or LE if it does not have a BOM. For example, on Windows a database field might contain BOM-less LE text. In such a case, a higher-level protocol is establishing the byte orientation. I personally agree that it would be far better to use the unambiguous term UTF-16LE for BOM-less UTF-16 text for such serializations. I think the problem is that the term "UTF-16" can also refer to the in-memory use of UTF-16 as a sequence of 16-bit chunks, where byte-orientation is not an issue (unless you are skanky and use unions to look at the bytes). So people like to simply refer to it as UTF-16 whether in memory or serialized, instead of the more appropriate term UTF-16LE. However, if the IETF liaison wants to present a proposal to restrict UTF-16 and UTF-32 -- when used as a serialization into bytes, to being only BE if there is no BOM, I believe that the UTC would certainly take that into consideration. The next meeting is happening very soon... Mark ----- Original Message ----- From: <ned.freed@mrochek.com> To: "Mark Davis" <markdavis34@home.com> Cc: "Harald Tveit Alvestrand" <harald@alvestrand.no>; "Mark Davis" <mark.davis@us.ibm.com>; <ned.freed@mrochek.com>; <ietf-charsets@iana.org> Sent: Friday, May 11, 2001 09:13 Subject: Re: Registration of new charsets UTF-32, UTF-32BE, UTF32LE > > Thanks for your feedback. I will resubmit them. > > > Comments: > > > A. If each charset needs to be in a separate message, then you really ought > > to fix http://www.normos.org/ietf/bcp/bcp19.txt. It says: > > > "5. Charset Registration Template > > > To: ietf-charsets@iana.org > > Subject: Registration of new charset [names]" > > > with the word "names" in plural. This is misleading. > > The rest of the regisration form clearly talks about a single charset with > multiple names, so I'm not sure I buy your reasoning here. However, since we > want to discourage the use of any aliases, I have no problem with changing it > to "name" singular. > > > B. UTF-32 in Unicode, as with UTF-16, could be BOM-less, with the > > orientation being determined by a higher-level protocol. The IETF > > registration (with good reason!) can impose a further restriction, as it > > does with UTF-16, that BOM-less UTF-16 must be BE. I will put such a clause > > in the registration. > > Which means the UTC has apparently learned nothing from the UTF-16 disaster. > If we push back on this is there any hope of getting this botch fixed? > > Ned >
Received on Friday, 11 May 2001 13:06:25 UTC