RE: internationalization/ISO10646 question

> Hi, Martin, Hi, Ned,

> thanks a lot for a response.

> It means then that I can use only UTF-8 for transport of Unicode over
> SMTP...

NO. IT DOESN'T MEAN THAT AT ALL. It means you cannot use a top-level type of
text with something like UTF-16. You are free to use various other top-level
types, including but not limited to application in SMTP to transfer UTF-16 or
whatever.

You really need to understand this point.

> Actually I am concerned about MMS specification prepared by NOKIA.
> The MMS message is being transported over WAP. WAP is a variant of HTTP.

What is or is not legal in WAP is surely up to the folks who defined WAP.

> The multipart/related message can contain text/plain parts.
> The NOKIA MMS Conformance Document (you can find it on the following page
> http://www.forum.nokia.com/main/1,35452,1_2_7_1,00.html) specifies 3 allowed
> character encodings for text parts (text/plain):
> 1. us-ascii (IANA MIBEnum 3)
> 2. utf-8 (IANA MIBEnum 100)
> 3. utf-16 (IANA MIBEnum 1000) with explicit BOM.

Which, as Martin has already pointed out, is fine in HTTP. The rules for
SMTP and HTTP are different. Your original question was about Outlook
Express, which clearly puts it in the SMTP area.

> The case 3. is definitely wrong since MIBEnum 1000 defines iso-10646-ucs-2
> and not utf-16.
> This bug is obvious, but actually it has been implemented on NOKIA handset
> (7650).
> In the world of handsets "NOKIA is the law", so the bug will spread
> probably.

It probably will.

> Coming back to the issue.
> And BY THE WAY:
> I understand SMTP..HTTP work in the way as below:
> - HTTP has is newer than SMTP.
> - HTTP specifies Content-Length header, the data is handled a bit
> differently.

The situation with HTTP is complex, but now's not the time to get into that.

> - SMTP (actually MIME) specifies the length of line to be maximum of 998
> characters due to existing old SMTP servers. I imagine (and it is somehow
> mentioned in the specs) that the old SMTP servers have a buffer for decoding
> the message line by line. The buffer can be 1000 characters and the server
> has to find CRLF sequence in the buffer. Otherwise it can fail. That is why
> the 998 limit has been imposed.

The 998 character limit applies to SMTP transport only. You can use a
content-tranfer-encoding to send arbitrarily long lines through SMTP using
MIME. Section 4.1.1 of RFC 2046 makes it clear this is only a tranport issue;
it is not a restriction on MIME text types.

> - SMTP (HTTP etc.) specifies BASE64 encoding for the above case: if there
> are binary data, they should be encoded into 7-bit text with line of max 76
> chars + CRLF = 78 chars.

The encoding mechanisns for SMTP and HTTP are different, which is not
surprising since they are there for very different reasons.

> So if we have Content-Transfer-Encoding: BASE64 in MIME it should be ok.
> SMTP server can handle this!

If by "handle this:" you mean you can send UTF-16 using BASE64 in SMTP, the
answer is still NO. You may be able to get away with it in some cases, but I
can assure you that there are others where you cannot. And again, the standards
are clear: This is illegal, you do it and you get what you deserve: Corrupted
mail, bounced mail, lost mail, and so on.

> And the above restriction has nothing to do with mail message transported
> over HTTP.

HTTP doesn't transport mail messages. The semantics are different.

> I unserstand that charset parameter of Content-Type: text/... specifies the
> charset for the end-user application, not for the server. Therefore, the
> message can be transported through the SMTP network with no problem (CRLF
> are present in the correct places..). Then there is mail client' turn. It
> removes BASE64 encoding and has a text part to display with some charset
> given.
> YOU say that it can only be UTF-8 (or us-ascii, but it is obvious).

Your understanding of the actual nature of the SMTP infrastructure is limited,
hopelessly optimistic, and astonishinly naive. Lots of other things can happen.
And do.

> My initial question was as follows (I am now introducing the restriction
> that I mean transporting text/plain over HTTP, not to violate your answer):
> I receive over HTTP the following header (part of mail message transported
> over HTTP):
>         Content-Type: text/plain; charset="iso-10646-ucs-2" CRLF
> 	  Content-Transfer-Encoding: BASE64 CRLF
> 	  CRLF
> 	  data...

And my response to this issue remains the same.


				Ned

Received on Wednesday, 4 December 2002 14:31:14 UTC