W3C home > Mailing lists > Public > www-international@w3.org > July to September 2003

Character encoding interpretation by a text editor

From: Desaulniers, Peter <Peter.Desaulniers@pahv.xerox.com>
Date: Thu, 24 Jul 2003 16:38:35 -0400
Message-Id: <4.2.0.58.J.20030724163820.04e55698@localhost>
To: www-international@w3.org




Dear all,

I am just trying to understand the fundamentals of inputing and output
characters to files or other byte streams.

I tried an experiment which I can not explain.  Please read the following
and see if you can offer an explanation.

Using Microsoft Notepad...

I created a text file with the character:  $Bq(B
I store it as ANSI, the file contains the byte: E9  (as viewed by a binary
editor)

I store it again as UTF8, that file contains the bytes: C3 A9

Then I open the ANSI file and I see $Br"(B (decodes E9 as ANSI)

Then I open the UTF8 file and I see $Br"(B (decodes C3 A9 as UTF8).   Why do I
not see the ANSI characters: $B%F%%(B?

How can opening two files with the same application with different bytes be
decoded into the same character?

If its the appropriate protocol for this forum, please reply directly to my
email address since I do not receive mail from the www-international@w3.org
mailings.

Thank you,

-- Peter Desaulniers
Received on Thursday, 24 July 2003 16:38:44 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:00 GMT