W3C home > Mailing lists > Public > www-international@w3.org > October to December 2007

Character encodings for beginners

From: Jonathan Pool <pool@utilika.org>
Date: Thu, 6 Dec 2007 11:10:37 -0800 (PST)
Message-ID: <49405.192.168.1.2.1196968237.squirrel@utilika.org>
To: www-international@w3.org

Valuable contribution!

I suggest changing "ie." to the standard "i.e." wherever it appears.

Readers may not know that a byte is an 8-bit byte and can represent at most
256 characters, so they may not understand the reasoning in the second
sentence in the panel.

I suggest changing "too high a number" to "too large a number".

I would change "change the bytes," to "change the bytes;".

That sentence could benefit from some elaboration, since it may be unclear to
the reader what it means to save text "in" an encoding. If the reader sees a
text and saves it, the reader often isn't asked "In what encoding do you want
to save this text?". So, the reader may not know what this means conceptually
or operationally.

The "You need to ..." paragraphs at the end could be discouraging. If there
are common cases in which an authoring or publishing system handles all this,
it may be better to say so first under "How does this affect me?", and then
say that in other cases "You may need to ...".

It might be useful to provide some easy diagnostic tool for a common case,
such as UTF-8. Maybe it could be a passage containing various scripts,
accompanied with a graphic showing its canonical appearance, with instructions
to copy and paste it into an authoring environment and check whether the
client shows all its characters looking like the graphic.

The term "script" doesn't appear in the article except in the sense of
"program code", but it is pervasive in the character-encoding literature, so
it might be useful to say what a script is and how it fits into the topic.

A glossary might be useful.
Received on Thursday, 6 December 2007 21:27:55 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:15 GMT