Re: [wmvs] do we still need charset.cfg to list the "acceptable" character encodings?

Hi Martin

On Thu, Jul 19, 2007, Martin Duerst wrote:
> The problem with your aproach is that the list of things that are
> okay (officially IANA registered AND useful) is much shorter
> than all the aliases and other crap allowed by encode, and those
> aliases may change easily (mostly, the list will grow) when
> upgrading.

Understood, but I want to find a solution that can be more useful than
just having a list of what's OK, and throwing a fatal error when it's
not. We got, many times, people complaining that the validator refused
charsets used by whole countries because we just didn't know they had
been added (both to the IANA list and our transcoding libs). I don't
want that to happen any more.

> So I think you should seriously reconsider your commit.

I think the commit made today, which will allow the validator to say
"hey, ascii is OK but you should really be using us-ascii", is very
useful. So I don't want to revert that. Let's try and improve on this.

Nor do I want to revert to throwing a fatal error if a charset is
technically functional, but happens to not be in the list.

A better solution may be:
* no fatal error if the charset is supported by encode
* a warning with the suggestion for a better alias if we know one
* a warning that the encoding may be "odd" if not in the list be encode
 knows about it

This way, the list (which I can rebuild based on 
http://dev.w3.org/cvsweb/~checkout~/validator/htdocs/config/charset.cfg?rev=1.13&content-type=text/plain
and 
http://dev.w3.org/cvsweb/~checkout~/validator/htdocs/config/charset.cfg?rev=1.10.2.2&content-type=text/plain
) can be used to suggest good behavior, but never as a generator of
fatal errors, which are not really helpful, especially when the list is
just outdated...

I'm sending a commit implementing this solution right away.

-- 
olivier

Received on Thursday, 19 July 2007 08:16:40 UTC