W3C home > Mailing lists > Public > ietf-charsets@w3.org > October to December 2002

UTF-8 interop testing

From: Francois Yergeau <FYergeau@alis.com>
Date: Thu, 28 Nov 2002 15:46:46 -0500
To: ietf-charsets@iana.org
Message-id: <F7D4BDA0E5A1D14B99D32C022AEB73669AA08B@alis-2k.alis.domain>
This message is the initial part of a test of interoperability of UTF-8 on
the Internet.

The test is based on a test file of plain text encoded in UTF-8, containg
text in a few languages and scripts. The test file was composed in Windows
2000 Notepad and is attached to this message as test-utf-8.txt.  It has an
initial BOM (Byte Order Mark).  The content of the test file, copy-pasted
into this message is:

------------------>snip<---------------------
UTF-8 interop test
===================
日本語: 明朝
Русский: Здравствуйте!
Ηελλένικα: Γειά σας
Español: ¡Hola!
Türkçe: Merhaba 
===================
------------------>snip<---------------------

Also attached is test-utf-8.jpg, a JPEG screen shot showing the test file in
Notepad superimposed on this message as it is being composed in Outlook
2000.  The image thus shows the test text twice in two tools using different
fonts.

Also attached is test-utf-8.html, an HTML version of the same text.  Instead
of a BOM, this version uses a <meta> element to identify the charset as
UTF-8.

An interop report will follow as a reply to this message.

-- 
François Yergeau



test-utf-8.jpg
(image/jpeg attachment: test-utf-8.jpg)

Received on Thursday, 28 November 2002 15:48:17 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 5 June 2006 15:10:54 GMT