- From: Mark Davis <markdavis34@home.com>
- Date: Fri, 11 May 2001 07:31:33 -0700
- To: Harald Tveit Alvestrand <harald@alvestrand.no>, Mark Davis <mark.davis@us.ibm.com>
- Cc: ned.freed@mrochek.com, ietf-charsets@iana.org
Thanks for your feedback. I will resubmit them. Comments: A. If each charset needs to be in a separate message, then you really ought to fix http://www.normos.org/ietf/bcp/bcp19.txt. It says: "5. Charset Registration Template To: ietf-charsets@iana.org Subject: Registration of new charset [names]" with the word "names" in plural. This is misleading. B. UTF-32 in Unicode, as with UTF-16, could be BOM-less, with the orientation being determined by a higher-level protocol. The IETF registration (with good reason!) can impose a further restriction, as it does with UTF-16, that BOM-less UTF-16 must be BE. I will put such a clause in the registration. Mark ----- Original Message ----- From: "Harald Tveit Alvestrand" <harald@alvestrand.no> To: "Mark Davis" <mark.davis@us.ibm.com> Cc: <ned.freed@mrochek.com>; <ietf-charsets@iana.org> Sent: Thursday, May 10, 2001 23:52 Subject: Re: Registration of new charsets UTF-32, UTF-32BE, UTF32LE > At 14:33 09.05.2001 -0700, Mark Davis wrote: > > >Sorry, I missed that. Do you want me to resubmit, or could you just make > >that change? > > Resubmit. > > note: each charset should have its own registration form. > > BTW, TR19 is technically broken in its definition of UTF-32: it specifies > that an UTF-8 character stream MAY OR MAY NOT begin with a Byte Order Mark, > and that octets can be in any order. > > >D36c > > (a) UTF-32 is the Unicode Transformation Format that serializes a > > Unicode code point as a sequence of four bytes, in either big-endian or > > little-endian format. An initial sequence corresponding to U+FEFF is > > interpreted as a byte order mark: it is used to distinguish between the > > two byte orders. The byte order mark is not considered part of the > > content of the text. A serialization of Unicode code points into > > UTF-32 may or may not begin with a byte order mark. > > This allows (when taking exquisite care - you only have 4.1 bits that are > valid in both upper and lower halves of the 32-bit word) the construction > of octet sequences that are ambiguous. > > If either the specification or the registration had said "A serialization > of Unicode code points into UTF-32 that does not begin with a byte order > mark MUST be in Big Endian", I would not have protested. But this is, IMHO, > just too broken to be registered as a charset. > > As written, I OPPOSE the registration of UTF-32. > (Apologies for having missed it at Unicode standardization time - we saw it > coming, and did not catch it in time) > > Harald >
Received on Friday, 11 May 2001 12:37:10 UTC