Character encoding interpretation by a text editor from Desaulniers, Peter on 2003-07-24 (www-international@w3.org from July to September 2003)

From: Desaulniers, Peter <Peter.Desaulniers@pahv.xerox.com>
Date: Thu, 24 Jul 2003 16:38:35 -0400
To: www-international@w3.org
Message-Id: <4.2.0.58.J.20030724163820.04e55698@localhost>

Dear all,

I am just trying to understand the fundamentals of inputing and output
characters to files or other byte streams.

I tried an experiment which I can not explain.  Please read the following
and see if you can offer an explanation.

Using Microsoft Notepad...

I created a text file with the character:  �
I store it as ANSI, the file contains the byte: E9  (as viewed by a binary
editor)

I store it again as UTF8, that file contains the bytes: C3 A9

Then I open the ANSI file and I see 鬆 (decodes E9 as ANSI)

Then I open the UTF8 file and I see 鬆 (decodes C3 A9 as UTF8).   Why do I
not see the ANSI characters: テゥ?

How can opening two files with the same application with different bytes be
decoded into the same character?

If its the appropriate protocol for this forum, please reply directly to my
email address since I do not receive mail from the www-international@w3.org
mailings.

Thank you,

-- Peter Desaulniers

Received on Thursday, 24 July 2003 16:38:44 UTC