W3C home > Mailing lists > Public > public-qa-dev@w3.org > May 2007

[wmvs] do we still need charset.cfg to list the "acceptable" character encodings?

From: olivier Thereaux <ot@w3.org>
Date: Thu, 24 May 2007 15:13:06 +0900
Message-Id: <F61F9805-F513-4638-A998-41DBF5346EE4@w3.org>
Cc: Martin Duerst <duerst@it.aoyama.ac.jp>, Bjoern Hoehrmann <derhoermi@gmx.net>
To: QA-dev Dev <public-qa-dev@w3.org>

Hello,

If you don't mind going a few years back, I would like to get your  
recollections of (and opinion on) the character encoding list  
accepted by the markup validator.

http://dev.w3.org/cvsweb/validator/htdocs/config/charset.cfg

Technically speaking, we do not really need this list any more. To  
know whether an encoding is technically supported, we have a small  
routine with Encode::decode() that does the job just fine. The Encode  
module seems to support a wide variety of encodings, too, much wider  
than the list we have.

e.g iso_8859-1 - http://qa-dev.w3.org/wmvs/HEAD/dev/tests/197- 
iso88591_alias.html

I haven't yet tested whether Encode supports all IANA listed  
characters, but if it does not, then we could always pass the  
character encoding declared through something like I18N::Alias, as  
suggested in
http://www.w3.org/Bugs/Public/show_bug.cgi?id=197

Therefore, there is no technical reason why we should enforce the use  
of a small list of accepted charsets.

However, the charset.cfg documents itself with (since revision 1.11  
committed by Bjoern):
[[
The Validator will refuse to decode documents in an encoding
other than those listed here. The list is independent of what
is supported on a specific system but subject to the Validator
policy for acceptable encodings.
]] -- http://dev.w3.org/cvsweb/validator/htdocs/config/ 
charset.cfg.diff?r1=1.10&r2=1.11&f=h

Sounds reasonable, but what's the policy? And where does it come from?
All I can find so far in normative documents systematically points to  
the IANA registry.
http://www.iana.org/assignments/character-sets
And searching the lists archives does not give me a clear lead on  
whether there used to be a policy in the validator to favor such  
charset or other.

Anyone has any thought/recollection on this?

Thanks.
-- 
olivier
Received on Thursday, 24 May 2007 06:13:08 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 19 August 2010 18:12:48 GMT