- From: Marcin Hanclik <mhanclik@poczta.onet.pl>
- Date: Mon, 25 Nov 2002 21:09:21 +0100
- To: ned.freed@mrochek.com
- Cc: ietf-charsets@iana.org
Hi! Your explanation means that you cannot send UTF-16 encoding, because it cannot preserve CRLF. You could not send any unicode characters (apart from UTF-8) in MIME then!!! The media type you are writing about is to be used in the form: Content-Type: text/utf-16... and I mean: Content-Type: text/plain; charset="UTF-16" So I understand from your mail that BOM should be accepted when we have: Content-Type: text/plain; charset="iso-10646-ucs-2" RGDS/Marcin > -----Original Message----- > From: ned.freed@mrochek.com [mailto:ned.freed@mrochek.com] > Sent: Monday, 25 November, 2002 19:10 > To: Marcin Hanclik > Cc: ned.freed@mrochek.com > Subject: RE: internationalization/ISO10646 question > > > > Dear Ned, > > > thank you very much for answer. > > However, I would like to discuss it. > > > > -----Original Message----- > > > From: ned.freed@mrochek.com [mailto:ned.freed@mrochek.com] > > > Sent: Friday, 22 November, 2002 19:52 > > > To: Marcin Hanclik > > > Cc: ietf-charsets@iana.org > > > Subject: Re: internationalization/ISO10646 question > > > > > > > > > > Dear Sirs, > > > > > > > I am writing to you as to the experts in internationalization > > > and ISO-10646 > > > > issues. > > > > > > > I would be very grateful if you could help me with the > following issue > > > > described below. > > > > > > > Generally the question refers to MIME encoding of text part. > > > > Particularily to the following case: > > > > Content-Type: text/plain; charset="iso-10646-ucs-2" > > > > Content-Transfer-Encoding: ... > > > > > > This, I'm afraid, is an illegal combination of elements. > Specifically, any > > > material with a top level media type of "text" has to > represent carriage > > > return/line feed as the literal sequence 0x13 0x10. > > > iso-10646-ucs-2 clearly > > > does not do this, and as such is a media type that's not suited > > > for use with > > > MIME text. > > > > > > This requirement is spelled out in RFC 2046 section 4.1.1. > > I think it is not the case. Content-Transfer-Encoding header has to take > > care of CRLF handling. > > It is specified in RFC2046, 4.1.2. > > On the contrary, section 4.1.2 in fact reiterates the CRLF > requirement in that > it discusses how the charsets can be used with other top level > types "with the > CRLF/line break restriction removed". > > > I left empty space for this parameter, but generally it is > BASE64 in this > > case. > > The restrictions on the text top level type are completely > independent of what > content-transfer-encoding is used. It is also true that the > domain of various > content-transfer-encodings are restricted in various ways, > including but not > limited to the use of CRLFs, but this has nothing to do with the > restrictions > on the text top level type. > > > > > > > > Data > > > > > > > Data after decoding: 0xFF 0xFE 0x66 0x00 0x65 0x00 > > > > > > > Outlook Express decodes it to "fe" string. But there are > people, who say > > > > that this is robustness of Outlook Express and that the > string is not > > > > properly encoded, because in the time when > > > <charset="iso-10646-ucs-2"> was > > > > specified/assigned with IANA the byte order mark (BOM) did > not exist. > > > > > > I don't know if there are specific rules for handling revisions to > > > iso-10646-ucs-2 or not. I suspect not. However, the general > rule is that > > > additions to a charset repetertoire are expected and allowed. > See RFC 2279 > > > section 3. However, the BOM is something of a special case. > > > > > This is a good a argument to me. > > > But given the far more egregious violation going on here I > really don't > > > think this is particular important in the overall scheme of things. > > > > > The above violation is not the case here, I think. > > I'm sorry, but it most certainly is the case here. Indeed, there > would be no > point in having the labelling of whether or not a given charset > was "suitable > for use in MIME text" if it weren't for this restriction. > > This is a case where the standards are clear, the standards > clearly reflect the > intent of the group that developed them, and the registration > requirements now > reflect the restrictions put in place by the standards. > > You can even see this in action in the registration of things > like UTF-16LE. > RFC 2781 section A.2 contains the registration for this charset, and among > other things it says "Suitable for use in MIME content types > under the "text" > top-level type: No" > > Unfortunately iso-10646-ucs-2 was registered before the rules > were place to > call this out in registrations, but that doesn't mean it is > suitable for use in > MIME text. > > > So the question remains: > > can I use <charset="iso-10646-ucs-2"> for the data containing BOM? > > And the answer remains that for material with a top level content > type of text, > which is what you said you were dealing with, you cannot use this > charset at > all. As such, any handling of it is possible, up to and including > rejection of > the message as invalid. > > For material that isn't labelled with a top level content type of > text I don't > think the situation is clear, but the intent has always been to > allow additions > to charsets subsequent to registration. So I think BOM should be > supported in > this context. > > Ned
Received on Tuesday, 3 December 2002 22:43:49 UTC