RE: UTF-8 interop testing

This message reports on a test of interoperability of UTF-8 on the Internet.

The test was initiated by sending a UTF-8 mail message to the
ietf-charsets@iana.org mailing list. The list is archived and the test
message can be seen at
http://lists.w3.org/Archives/Public/ietf-charsets/2002OctDec/0106.html.

This message is a reply to the test message.  Here is an extract of the test
message as received from the list:

> UTF-8 interop test
> ===================
> 日本語: 明朝
> Русский: Здравствуйте!
> Ηελλένικα: Γειά σας
> Español: ¡Hola!
> Türkçe: Merhaba 
> ===================

Comparing the above as displayed in Outlook 2000 with the JPEG image
attached to the test message, one can verify that the message body was
transmitted faithfully.

The test message was also sent directly (not through the list) to a POP3
account, from where it was retrieved using two different mail user agents
(Netscape Messenger 4.5 and 7).  The text displayed in Netscape 7 again
matches the original; copy-pasting from Netscape 7 to this message, we get:

------------------>snip<---------------------
UTF-8 interop test
===================
日本語: 明朝
Русский: Здравствуйте!
Ηελλένικα: Γειά σας
Español: ¡Hola!
Türkçe: Merhaba 
===================
------------------>snip<---------------------

which matches the original.  The text displayed in Netscape 4.5 was garbled
at first.  It seems that a bug in the implementation prevents it from
picking up the charset identification present in the MIME headers:

------------------>snip<---------------------
------_=_NextPart_000_01C2971F.49C3C0F5
Content-Type: text/plain;
 charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
------------------>snip<---------------------

After manually adjusting the encoding to UTF-8 manually through a menu
command, Netscape 4.5 displayed the test message correctly.

Displaying the attachments (both plain text and HTML) of the original
message shows the same text when using both MS Outlook 2000 and Netscape 7,
with no manual adjustment of encoding needed.  Netscape 4.5, however, did
not manage to display the attachments correctly, even after attempting to
manually set the encoding.

This part of the test shows that the combination of the SMTP, MIME and POP3
protocols support UTF-8 correctly, as does the mailing list server, with the
caveat that an older implementation (namely Netscape 4.5, dating back to
1998) displayed shortcomings.

Another part of the test consists in examining the test message and its
attachments as served on the Web at
http://lists.w3.org/Archives/Public/ietf-charsets/2002OctDec/0106.html by
the mailing list archive.  Browsing to this URL with four different browsers
(Netscape 4.5, MSIE 6.0, Netscape 7 and Opera 6.05), one can verify that the
text is again reproduced faithfully, without any manual adjustement of
encoding.  Copy-pasting from a browser window to this message, we get:

------------------>snip<---------------------
UTF-8 interop test
===================
日本語: 明朝
Русский: Здравствуйте!
Ηελλένικα: Γειά σας
Español: ¡Hola!
Türkçe: Merhaba 
===================
------------------>snip<---------------------

The plain text and HTML attachments are also displayed correctly by all four
browsers.

This part of the test shows that the combination of the SMTP, MIME and HTTP
protocols plus the mail archiving software support UTF-8 correctly.  We have
seen no shortcomings even with the older Netscape 4.5 browser.  The only
glitch is that the archiving software truncates the display of some headers
(the Subject:, From: as well as the file names displayed for the
attachments).  This appears to be purely a display problem, since the
attachments can be reached.

-- 
François Yergeau

P.S.  Sharp eyes may have noticed that the txt attachment did not actually
match the screen shot: the first line as well as the two rows of equal signs
are missing.  This is due purely to human error (forgot to save! Red all
over)-:, not to any software error or fool play.

Received on Thursday, 28 November 2002 16:40:51 UTC