Using I18N::Charset in the W3C Markup Validator from Bjoern Hoehrmann on 2005-08-18 (public-qa-dev@w3.org from August 2005)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Thu, 18 Aug 2005 21:25:52 +0200
To: mthurn@cpan.org
Cc: public-qa-dev@w3.org
Message-ID: <1bn9g19bahf0psm7st2gtksofnbhsmdqbo@hive.bjoern.hoehrmann.de>

Hi Martin,

  Over at the http://validator.w3.org we are currently switching
from character encoding support code based on proprietary code and
Text::Iconv to something based on Encode and friends. We would like
the service to support a broad range of encodings and alias names
for them, so we will probably use your I18N::Charset module at some
point.

There seem to be two features missing at the moment though, at least
it's not clear from the documentation how to achieve it. We would
like the service to warn about using charset names that are not re-
gistered in the IANA charset registry and where possible to point
out a registered alias to use instead, e.g. for CP1252 there should
be a warning that it is not registered and Windows-1252 should be
used instead.

>From the documentation it seems all_iana_charset_names() could be
used to determine whether the charset was registered when the module
was released, and for encodings that are not registered, the better
name could be determined through iana_charset_name, probably in com-
bination with mime_charset_name to get the preferred name. Do you
agree?

In this case all we would need is more frequent updates to the module
so we can minimize the risk to report charsets as unregistered when
they actually are registered. I think something like 2-3 months would
be fair enough. In case that's all we need, would it be possible for
you to make updates to it more often?

Thanks,
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/

Received on Thursday, 18 August 2005 19:25:38 UTC