Character set (charset) validation

From: Markus Kramer (kramer@molgen.mpg.de)
Date: Thu, Mar 23 2000


Date: Thu, 23 Mar 2000 08:15:54 -0500 (EST)
Message-ID: <38DA18F6.CBF3013F@molgen.mpg.de>
From: Markus Kramer <kramer@molgen.mpg.de>
To: www-validator@w3.org
Subject: Character set (charset) validation

All started when I looked at my (validated) HTML-page from a
Macintosh...
(I had no character set denoted.)
I assumed Isolatin-1.
The Macintosh assumed it's own charater set and displayed a mess.

So I put a META-tag in my document to denote the character set:
<META http-equiv="Content-Type" content="text/html; charset=....">

When I tried "charset=isolatin-1" nothing happend.
When I tried "charset=ISO-8859-1" The Macintosh displayed every
character correct!
I was happy. 

Now I want to make a suggestion for improving the validator:

The validator did not report the wrong name (isolatin-1) because he will
report *any* string as a 'character set'.
For example
<META http-equiv="Content-Type" content="text/html; charset=mumble">
will result in 

Document Checked
...
     Character encoding: mumble
...

I propose that the validator provides a (link to a) list of common
character sets, which I have looked for but could not find.
The validator could produce a warning, if someone (like me) puts in
"mumble" for a charset.
Like:
    Your Character Encoding "mumble" was not found in our <a ...>list of
common character sets</a>.
    Please check your spelling or notify us of a new common character
set.

I found in this newgroup an old (1996) discussion about isolatin-1
beeing the default character set of the web (which was considered a bad
thing).
As the Macinotosh does not assume isolatin-1 beeing the default, the
validator could issue a warning, if there is no character set given.


Markus