- From: Marcin Hanclik <mhanclik@poczta.onet.pl>
- Date: Fri, 22 Nov 2002 12:06:20 +0100
- To: ietf-charsets@iana.org
Dear Sirs, I am writing to you as to the experts in internationalization and ISO-10646 issues. I would be very grateful if you could help me with the following issue described below. Generally the question refers to MIME encoding of text part. Particularily to the following case: Content-Type: text/plain; charset="iso-10646-ucs-2" Content-Transfer-Encoding: ... Data Data after decoding: 0xFF 0xFE 0x66 0x00 0x65 0x00 Outlook Express decodes it to "fe" string. But there are people, who say that this is robustness of Outlook Express and that the string is not properly encoded, because in the time when <charset="iso-10646-ucs-2"> was specified/assigned with IANA the byte order mark (BOM) did not exist. This is why in detail: My current knowledge on character encoding: character set | transport (charset=,MIBenum) ---------------+----------------------------------------------- UCS-2 | ISO-10646-UCS-2,1000 (network byteorder) (Unicode 1.1) | (BOM does not exist) ---------------+----------------------------------------------- UCS-2, UCS-4 | UTF-8,106 (endian independent) | (BOM is not necessary but U+FEFF is acceptable) ---------------+----------------------------------------------- UCS-2, UCS-4 | UTF-16,1015 (BOM or big endian) ---------------+----------------------------------------------- UCS-2, UCS-4 | UTF-16BE,1013 (big endian) | (BOM is not necessary but U+FEFF == 0xFE,0xFF is acceptable) ---------------+----------------------------------------------- UCS-2, UCS-4 | UTF-16LE,1014 (little endian) | (BOM is not necessary but U+FEFF == 0xFF,0xFE is acceptable) ---------------+----------------------------------------------- Annex H to ISO-10646:2000 specifies a signature of the UCS used to identify the data. As I know, charset=ISO-10646-UCS-2(MIBenum 1000) was defined for ISO-10646-1:1993 (Unicode 1.1) where BOM did not appear. MY QUESTIONS: 1. Can I use charset=ISO-10646-UCS-2 parameter to describe data in ISO-10646:2000 format with BOM? 2. Is it now so, that charset=ISO-10646-UCS-2 specifies ucs-2 from both ISO-10646:2000 and ISO-10646:1993? Thank you in advance for an answer. Kind regards, Marcin Hanclik
Received on Monday, 25 November 2002 13:30:50 UTC