- From: Francois Yergeau <FYergeau@alis.com>
- Date: Thu, 09 Jan 2003 17:12:43 -0500
- To: ietf-charsets@iana.org
This message reports on Take 3 of interoperability testing of UTF-8 on the Internet. The test was initiated by sending a UTF-8 mail message to the ietf-charsets@iana.org mailing list. The list is archived and the test message can be seen at http://lists.w3.org/Archives/Public/ietf-charsets/2003JanMar/0039.html. This message is a reply to the test message. Here is an extract of the test message as received from the list: > UTF-8 interop test > =================== > 日本語: 明朝 > Русский: Здравствуйте! > Ηελλένικα: Γειά σας > Español: ¡Hola! > Türkçe: Merhaba > عربي: السلام عليكم > 𐌸𐌹𐌶𐌰𐌹: 𐍅𐌿𐌻𐍆𐌹𐌻𐌰 > =================== Comparing the above as displayed in Outlook 2000 with the JPEG image attached to the test message, one can verify that all but the last line (Gothic script, non-BMP characters) are displayed properly. The incorrect display of the last line, however, is purely a rendering problem in a single font environment, as can be ascertained by copy-pasting the above to Windows Wordpard and setting an appropriate font on the last line: the whole thing is displayed correctly, showing that the message body was transmitted faithfully. The test message was also sent directly (not through the list) to a POP3 account, from where it was retrieved using Netscape Messenger 7). The text displayed in Netscape 7 again matches the original, except for the Gothic line which suffers from a rendering problem; copy-pasting from Netscape 7 to this message, we get: ------------------>snip<--------------------- UTF-8 interop test =================== 日本語: 明朝 Русский: Здравствуйте! Ηελλένικα: Γειά σας Español: ¡Hola! Türkçe: Merhaba عربي: السلام عليكم �����: ������� =================== ------------------>snip<--------------------- which matches the original. Displaying the attachments (both plain text and HTML) of the original message shows the same text when using both MS Outlook 2000 and Netscape 7, with no manual adjustment of encoding needed. This part of the test shows that the combination of the SMTP, MIME and POP3 protocols support UTF-8 correctly, as does the mailing list server. Another part of the test consists in examining the test message and its attachments as served on the Web at http://lists.w3.org/Archives/Public/ietf-charsets/2003JanMar/0039.html by the mailing list archive. Browsing to this URL with three different browsers (MSIE 6.0, Netscape 7 and Opera 6.05), one can verify that the text is again reproduced mostly faithfully, without any manual adjustement of encoding, except for the Gothic which does not render correctly (white rectangles) in MSIE. Copy-pasting from a browser window to this message, we get: ------------------>snip<--------------------- UTF-8 interop test =================== 日本語: 明朝 Русский: Здравствуйте! Ηελλένικα: Γειά σας Español: ¡Hola! Türkçe: Merhaba عربي: السلام عليكم 𐌸𐌹𐌶𐌰𐌹: 𐍅𐌿𐌻𐍆𐌹𐌻𐌰 =================== ------------------>snip<--------------------- Copying this to Wordpad shows that the data is intact. The plain text and HTML attachments are also displayed correctly, with the same caveats for Gothic as above, by all thee browsers. This part of the test shows that the combination of the SMTP, MIME and HTTP protocols plus the mail archiving software support UTF-8 correctly, except in some cases for the rendering of the Gothic script (data integrity is preserved, however). The only other glitch is that the archiving software truncates the display of some headers (the Subject:, From: as well as the file names displayed for the attachments). This appears to be purely a display problem, since the attachments can be reached. -- François Yergeau
Received on Thursday, 9 January 2003 17:13:59 UTC