W3C home > Mailing lists > Public > www-validator@w3.org > April 2005

Re: Charset policy or?

From: Frank Ellermann <nobody@xyzzy.claranet.de>
Date: Sat, 23 Apr 2005 03:35:13 +0200
To: www-validator@w3.org
Message-ID: <4269A651.1342@xyzzy.claranet.de>

leif halvard silli wrote:
> I came to wonder about the policy behind which encodings
> (charsets) that the Validator support.

Sometimes it doesn't support I18N, no idea why.  Adding a
list of "some known 8-bit charset like windows-1252" should
be trivial.

> for instance the result of a puristic wish to only support
> the IANA registred charsets

If it's not IANA registered it doesn't exist.  The validator
tries to catch invalid character encodings depending on the
document charset, maybe also depending on XML 1.1 vs. 1.0, so
it can't handle unknown encodings.  E.g. byte 133, it depends
on several factors how to handle it.

OTOH you could always enforce "assume windows-1252" for all
MIME-compatible 8-bits charsets where codepoints 128..159
are valid.  You could enforce Latin-1 where that's not true.
And of course UTF-8 etc. are directly supported.

> I was very suprised to find out that x-mac-roman was not
> accepted.

Compare <http://www.iana.org/assignments/character-sets> :
x-mac-roman does not exist, if you think that this is wrong
register it (but maybe x-... is reserved for private use).

> the validator adviced me to use 'macintosh' as charset name.

Yes, that exists, why not use it ?

> We Mac users live in this very perfect world where all
> encodings are named x-mac-something.

validator.w3.org is for the WWW.  Maybe you could patch the
sources for a parallel universe of Mac users and a similar
validator.mac.org ?
                       Bye, Frank
Received on Saturday, 23 April 2005 01:37:09 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 14:17:45 UTC