W3C home > Mailing lists > Public > www-i18n-comments@w3.org > May 2004

Illustrate and explain "character encoding"

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Fri, 21 May 2004 05:10:23 +0200
To: www-i18n-comments@w3.org
Message-ID: <40af6d77.20869488@smtp.bjoern.hoehrmann.de>

Hi,

  [1] http://www.w3.org/TR/i18n-html-tech-char/
  [2] http://www.w3.org/International/tutorials/tutorial-char-enc.html

Could either or both please have some basic discussion and illustration
of what a character encoding actually is? This is something difficult to
teach as many people haven't ever got in touch with binary data, they
use their text editor for "text" documents and most of the time it works
just fine. That's something such documents should break at the very
beginning; this is binary data, as in 100101010010101011010101010001...
Something with an image or images, here is a poor example

  http://lists.w3.org/Archives/Public/www-archive/2004May/att-0050/encoding.png

Basically all [2] says about this is, relatively late in the document

  ...
  The character encoding reflects the way these abstract characters
  are mapped to bytes for manipulation in a computer.
  ...

And [1] contains more or less nothing that would help to understand
what's going on behind the scenes of the software readers use every day.
Catch reader by logic. In my example, if the charset=utf-8 parameter
is missing, how is a browser supposed to know how to turn 100101001...
into characters? That does not work. That's what readers need to
understand.

regards.
Received on Thursday, 20 May 2004 23:10:43 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 October 2009 08:32:34 GMT