- From: olivier Thereaux <ot@w3.org>
- Date: Thu, 24 May 2007 15:13:06 +0900
- To: QA-dev Dev <public-qa-dev@w3.org>
- Cc: Martin Duerst <duerst@it.aoyama.ac.jp>, Bjoern Hoehrmann <derhoermi@gmx.net>
Hello, If you don't mind going a few years back, I would like to get your recollections of (and opinion on) the character encoding list accepted by the markup validator. http://dev.w3.org/cvsweb/validator/htdocs/config/charset.cfg Technically speaking, we do not really need this list any more. To know whether an encoding is technically supported, we have a small routine with Encode::decode() that does the job just fine. The Encode module seems to support a wide variety of encodings, too, much wider than the list we have. e.g iso_8859-1 - http://qa-dev.w3.org/wmvs/HEAD/dev/tests/197- iso88591_alias.html I haven't yet tested whether Encode supports all IANA listed characters, but if it does not, then we could always pass the character encoding declared through something like I18N::Alias, as suggested in http://www.w3.org/Bugs/Public/show_bug.cgi?id=197 Therefore, there is no technical reason why we should enforce the use of a small list of accepted charsets. However, the charset.cfg documents itself with (since revision 1.11 committed by Bjoern): [[ The Validator will refuse to decode documents in an encoding other than those listed here. The list is independent of what is supported on a specific system but subject to the Validator policy for acceptable encodings. ]] -- http://dev.w3.org/cvsweb/validator/htdocs/config/ charset.cfg.diff?r1=1.10&r2=1.11&f=h Sounds reasonable, but what's the policy? And where does it come from? All I can find so far in normative documents systematically points to the IANA registry. http://www.iana.org/assignments/character-sets And searching the lists archives does not give me a clear lead on whether there used to be a policy in the validator to favor such charset or other. Anyone has any thought/recollection on this? Thanks. -- olivier
Received on Thursday, 24 May 2007 06:13:08 UTC