At 14:33 09.05.2001 -0700, Mark Davis wrote: >Sorry, I missed that. Do you want me to resubmit, or could you just make >that change? Resubmit. note: each charset should have its own registration form. BTW, TR19 is technically broken in its definition of UTF-32: it specifies that an UTF-8 character stream MAY OR MAY NOT begin with a Byte Order Mark, and that octets can be in any order. >D36c > (a) UTF-32 is the Unicode Transformation Format that serializes a > Unicode code point as a sequence of four bytes, in either big-endian or > little-endian format. An initial sequence corresponding to U+FEFF is > interpreted as a byte order mark: it is used to distinguish between the > two byte orders. The byte order mark is not considered part of the > content of the text. A serialization of Unicode code points into > UTF-32 may or may not begin with a byte order mark. This allows (when taking exquisite care - you only have 4.1 bits that are valid in both upper and lower halves of the 32-bit word) the construction of octet sequences that are ambiguous. If either the specification or the registration had said "A serialization of Unicode code points into UTF-32 that does not begin with a byte order mark MUST be in Big Endian", I would not have protested. But this is, IMHO, just too broken to be registered as a charset. As written, I OPPOSE the registration of UTF-32. (Apologies for having missed it at Unicode standardization time - we saw it coming, and did not catch it in time) HaraldReceived on Friday, 11 May 2001 03:10:02 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 5 June 2006 15:10:51 GMT