- From: Francois Yergeau <FYergeau@alis.com>
- Date: Thu, 28 Nov 2002 16:39:31 -0500
- To: ietf-charsets@iana.org
This message reports on a test of interoperability of UTF-8 on the Internet. The test was initiated by sending a UTF-8 mail message to the ietf-charsets@iana.org mailing list. The list is archived and the test message can be seen at http://lists.w3.org/Archives/Public/ietf-charsets/2002OctDec/0106.html. This message is a reply to the test message. Here is an extract of the test message as received from the list: > UTF-8 interop test > =================== > 日本語: 明朝 > Русский: Здравствуйте! > Ηελλένικα: Γειά σας > Español: ¡Hola! > Türkçe: Merhaba > =================== Comparing the above as displayed in Outlook 2000 with the JPEG image attached to the test message, one can verify that the message body was transmitted faithfully. The test message was also sent directly (not through the list) to a POP3 account, from where it was retrieved using two different mail user agents (Netscape Messenger 4.5 and 7). The text displayed in Netscape 7 again matches the original; copy-pasting from Netscape 7 to this message, we get: ------------------>snip<--------------------- UTF-8 interop test =================== 日本語: 明朝 Русский: Здравствуйте! Ηελλένικα: Γειά σας Español: ¡Hola! Türkçe: Merhaba =================== ------------------>snip<--------------------- which matches the original. The text displayed in Netscape 4.5 was garbled at first. It seems that a bug in the implementation prevents it from picking up the charset identification present in the MIME headers: ------------------>snip<--------------------- ------_=_NextPart_000_01C2971F.49C3C0F5 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable ------------------>snip<--------------------- After manually adjusting the encoding to UTF-8 manually through a menu command, Netscape 4.5 displayed the test message correctly. Displaying the attachments (both plain text and HTML) of the original message shows the same text when using both MS Outlook 2000 and Netscape 7, with no manual adjustment of encoding needed. Netscape 4.5, however, did not manage to display the attachments correctly, even after attempting to manually set the encoding. This part of the test shows that the combination of the SMTP, MIME and POP3 protocols support UTF-8 correctly, as does the mailing list server, with the caveat that an older implementation (namely Netscape 4.5, dating back to 1998) displayed shortcomings. Another part of the test consists in examining the test message and its attachments as served on the Web at http://lists.w3.org/Archives/Public/ietf-charsets/2002OctDec/0106.html by the mailing list archive. Browsing to this URL with four different browsers (Netscape 4.5, MSIE 6.0, Netscape 7 and Opera 6.05), one can verify that the text is again reproduced faithfully, without any manual adjustement of encoding. Copy-pasting from a browser window to this message, we get: ------------------>snip<--------------------- UTF-8 interop test =================== 日本語: 明朝 Русский: Здравствуйте! Ηελλένικα: Γειά σας Español: ¡Hola! Türkçe: Merhaba =================== ------------------>snip<--------------------- The plain text and HTML attachments are also displayed correctly by all four browsers. This part of the test shows that the combination of the SMTP, MIME and HTTP protocols plus the mail archiving software support UTF-8 correctly. We have seen no shortcomings even with the older Netscape 4.5 browser. The only glitch is that the archiving software truncates the display of some headers (the Subject:, From: as well as the file names displayed for the attachments). This appears to be purely a display problem, since the attachments can be reached. -- François Yergeau P.S. Sharp eyes may have noticed that the txt attachment did not actually match the screen shot: the first line as well as the two rows of equal signs are missing. This is due purely to human error (forgot to save! Red all over)-:, not to any software error or fool play.
Received on Thursday, 28 November 2002 16:40:51 UTC