- From: Martin Duerst <duerst@w3.org>
- Date: Thu, 05 Dec 2002 02:00:09 +0900
- To: Marcin Hanclik <mhanclik@poczta.onet.pl>, ned.freed@mrochek.com
- Cc: ietf-charsets@iana.org
Hello Marcin, I think Ned has said similar things before, but: For email (SMTP,...), Content-Type: text/plain; charset="UTF-16" just is illegal because of the CR/LF restrictions. The same applies for Content-Type: text/plain; charset="iso-10646-ucs-2" For HTTP, the situation is a bit different. HTTP uses some 'variant' of MIME, and does not inforce the CR/LF restrictions. So Content-Type: text/plain; charset="UTF-16" or more typically Content-Type: text/html; charset="UTF-16" is legal in HTTP. The same would apply for Content-Type: text/plain; charset="iso-10646-ucs-2" or Content-Type: text/html; charset="iso-10646-ucs-2" Whether a BOM should be accepted or not in that case depends on the registration of iso-10646-ucs-2. Regards, Martin. At 21:09 02/11/25 +0100, Marcin Hanclik wrote: >Hi! > >Your explanation means that you cannot send UTF-16 encoding, because it >cannot preserve CRLF. >You could not send any unicode characters (apart from UTF-8) in MIME then!!! > >The media type you are writing about is to be used in the form: >Content-Type: text/utf-16... > >and I mean: >Content-Type: text/plain; charset="UTF-16" > >So I understand from your mail that BOM should be accepted when we have: >Content-Type: text/plain; charset="iso-10646-ucs-2" > > >RGDS/Marcin > > -----Original Message----- > > From: ned.freed@mrochek.com [mailto:ned.freed@mrochek.com] > > Sent: Monday, 25 November, 2002 19:10 > > To: Marcin Hanclik > > Cc: ned.freed@mrochek.com > > Subject: RE: internationalization/ISO10646 question > > > > > > > Dear Ned, > > > > > thank you very much for answer. > > > However, I would like to discuss it. > > > > > > -----Original Message----- > > > > From: ned.freed@mrochek.com [mailto:ned.freed@mrochek.com] > > > > Sent: Friday, 22 November, 2002 19:52 > > > > To: Marcin Hanclik > > > > Cc: ietf-charsets@iana.org > > > > Subject: Re: internationalization/ISO10646 question > > > > > > > > > > > > > Dear Sirs, > > > > > > > > > I am writing to you as to the experts in internationalization > > > > and ISO-10646 > > > > > issues. > > > > > > > > > I would be very grateful if you could help me with the > > following issue > > > > > described below. > > > > > > > > > Generally the question refers to MIME encoding of text part. > > > > > Particularily to the following case: > > > > > Content-Type: text/plain; charset="iso-10646-ucs-2" > > > > > Content-Transfer-Encoding: ... > > > > > > > > This, I'm afraid, is an illegal combination of elements. > > Specifically, any > > > > material with a top level media type of "text" has to > > represent carriage > > > > return/line feed as the literal sequence 0x13 0x10. > > > > iso-10646-ucs-2 clearly > > > > does not do this, and as such is a media type that's not suited > > > > for use with > > > > MIME text. > > > > > > > > This requirement is spelled out in RFC 2046 section 4.1.1. > > > I think it is not the case. Content-Transfer-Encoding header has to take > > > care of CRLF handling. > > > It is specified in RFC2046, 4.1.2. > > > > On the contrary, section 4.1.2 in fact reiterates the CRLF > > requirement in that > > it discusses how the charsets can be used with other top level > > types "with the > > CRLF/line break restriction removed". > > > > > I left empty space for this parameter, but generally it is > > BASE64 in this > > > case. > > > > The restrictions on the text top level type are completely > > independent of what > > content-transfer-encoding is used. It is also true that the > > domain of various > > content-transfer-encodings are restricted in various ways, > > including but not > > limited to the use of CRLFs, but this has nothing to do with the > > restrictions > > on the text top level type. > > > > > > > > > > > Data > > > > > > > > > Data after decoding: 0xFF 0xFE 0x66 0x00 0x65 0x00 > > > > > > > > > Outlook Express decodes it to "fe" string. But there are > > people, who say > > > > > that this is robustness of Outlook Express and that the > > string is not > > > > > properly encoded, because in the time when > > > > <charset="iso-10646-ucs-2"> was > > > > > specified/assigned with IANA the byte order mark (BOM) did > > not exist. > > > > > > > > I don't know if there are specific rules for handling revisions to > > > > iso-10646-ucs-2 or not. I suspect not. However, the general > > rule is that > > > > additions to a charset repetertoire are expected and allowed. > > See RFC 2279 > > > > section 3. However, the BOM is something of a special case. > > > > > > > This is a good a argument to me. > > > > But given the far more egregious violation going on here I > > really don't > > > > think this is particular important in the overall scheme of things. > > > > > > > The above violation is not the case here, I think. > > > > I'm sorry, but it most certainly is the case here. Indeed, there > > would be no > > point in having the labelling of whether or not a given charset > > was "suitable > > for use in MIME text" if it weren't for this restriction. > > > > This is a case where the standards are clear, the standards > > clearly reflect the > > intent of the group that developed them, and the registration > > requirements now > > reflect the restrictions put in place by the standards. > > > > You can even see this in action in the registration of things > > like UTF-16LE. > > RFC 2781 section A.2 contains the registration for this charset, and among > > other things it says "Suitable for use in MIME content types > > under the "text" > > top-level type: No" > > > > Unfortunately iso-10646-ucs-2 was registered before the rules > > were place to > > call this out in registrations, but that doesn't mean it is > > suitable for use in > > MIME text. > > > > > So the question remains: > > > can I use <charset="iso-10646-ucs-2"> for the data containing BOM? > > > > And the answer remains that for material with a top level content > > type of text, > > which is what you said you were dealing with, you cannot use this > > charset at > > all. As such, any handling of it is possible, up to and including > > rejection of > > the message as invalid. > > > > For material that isn't labelled with a top level content type of > > text I don't > > think the situation is clear, but the intent has always been to > > allow additions > > to charsets subsequent to registration. So I think BOM should be > > supported in > > this context. > > > > Ned
Received on Wednesday, 4 December 2002 12:11:27 UTC