RE: UTF-8 interop testing, take 3

This message reports on Take 3 of interoperability testing of UTF-8 on the
Internet.

The test was initiated by sending a UTF-8 mail message to the
ietf-charsets@iana.org mailing list. The list is archived and the test
message can be seen at
http://lists.w3.org/Archives/Public/ietf-charsets/2003JanMar/0039.html.

This message is a reply to the test message.  Here is an extract of the test
message as received from the list:

> UTF-8 interop test
> ===================
> 日本語: 明朝
> Русский: Здравствуйте!
> Ηελλένικα: Γειά σας
> Español: ¡Hola!
> Türkçe: Merhaba
> عربي: السلام عليكم
> 𐌸𐌹𐌶𐌰𐌹: 𐍅𐌿𐌻𐍆𐌹𐌻𐌰
> ===================

Comparing the above as displayed in Outlook 2000 with the JPEG image
attached to the test message, one can verify that all but the last line
(Gothic script, non-BMP characters) are displayed properly. The incorrect
display of the last line, however, is purely a rendering problem in a single
font environment, as can be ascertained by copy-pasting the above to Windows
Wordpard and setting an appropriate font on the last line: the whole thing
is displayed correctly, showing that the message body was transmitted
faithfully.

The test message was also sent directly (not through the list) to a POP3
account, from where it was retrieved using Netscape Messenger 7).  The text
displayed in Netscape 7 again matches the original, except for the Gothic
line which suffers from a rendering problem; copy-pasting from Netscape 7 to
this message, we get:

------------------>snip<---------------------
UTF-8 interop test
===================
日本語: 明朝
Русский: Здравствуйте!
Ηελλένικα: Γειά σας
Español: ¡Hola!
Türkçe: Merhaba
عربي: السلام عليكم
�����: �������
===================
------------------>snip<---------------------

which matches the original.

Displaying the attachments (both plain text and HTML) of the original
message shows the same text when using both MS Outlook 2000 and Netscape 7,
with no manual adjustment of encoding needed. 

This part of the test shows that the combination of the SMTP, MIME and POP3
protocols support UTF-8 correctly, as does the mailing list server.

Another part of the test consists in examining the test message and its
attachments as served on the Web at
http://lists.w3.org/Archives/Public/ietf-charsets/2003JanMar/0039.html by
the mailing list archive.  Browsing to this URL with three different
browsers (MSIE 6.0, Netscape 7 and Opera 6.05), one can verify that the text
is again reproduced mostly faithfully, without any manual adjustement of
encoding, except for the Gothic which does not render correctly (white
rectangles) in MSIE.  Copy-pasting from a browser window to this message, we
get:

------------------>snip<---------------------
UTF-8 interop test
===================
日本語: 明朝
Русский: Здравствуйте!
Ηελλένικα: Γειά σας
Español: ¡Hola!
Türkçe: Merhaba
عربي: السلام عليكم
𐌸𐌹𐌶𐌰𐌹: 𐍅𐌿𐌻𐍆𐌹𐌻𐌰
===================
------------------>snip<---------------------

Copying this to Wordpad shows that the data is intact.

The plain text and HTML attachments are also displayed correctly, with the
same caveats for Gothic as above, by all thee browsers.

This part of the test shows that the combination of the SMTP, MIME and HTTP
protocols plus the mail archiving software support UTF-8 correctly, except
in some cases for the rendering of the Gothic script (data integrity is
preserved, however).  The only other glitch is that the archiving software
truncates the display of some headers (the Subject:, From: as well as the
file names displayed for the
attachments).  This appears to be purely a display problem, since the
attachments can be reached.

-- 
François Yergeau

Received on Thursday, 9 January 2003 17:13:59 UTC