- From: Sylvain Galineau <sylvaing@microsoft.com>
- Date: Wed, 9 Sep 2009 20:18:33 +0000
- To: Anne van Kesteren <annevk@opera.com>
- CC: "public-html-comments@w3.org" <public-html-comments@w3.org>
- Message-ID: <045A765940533D4CA4933A4A7E32597E021EEE91@TK5EX14MBXC113.redmond.corp.microsoft.>
Apologies for the delayed answer; I hope the following is helpful. IE trims leading and trailing spaces from the encoding name then does a lowercase match on one of the aliases listed in ie.encodings.txt (attached). ie.encodings.txt lists all the encodings using the format: <ui-label>,<alias>,<codepage>,<msdn-cp-identifier> Where: <ui-label> is the encoding name as reported in the Page-Encoding menu of IE8 RTM (us-en version). <alias> is an encoding name that maps to <ui-label>; the mapping is lowercase match of the input after trimming leading and trailing spaces. <codepage> is the codepage number for this encoding. <msdn-cp-identifier> is the description of the code page from http://msdn.microsoft.com/en-us/library/dd317756(VS.85).aspx In addition, as it might be helpful for future spec work, I've also attached a flat version of the IANA character set assignments at http://www.iana.org/assignments/character-sets The iana.charsets.map.txt file enumerates the charsets using the format: <IANA-name>,<alias> > -----Original Message----- > From: public-html-comments-request@w3.org [mailto:public-html-comments- > request@w3.org] On Behalf Of Anne van Kesteren > Sent: Monday, August 17, 2009 12:33 PM > To: Erik van der Poel > Cc: Ian Hickson; public-html-comments@w3.org > Subject: Re: charset name matching rules > > On Mon, 17 Aug 2009 17:26:52 +0200, Erik van der Poel > <erikv@google.com> > wrote: > > I stopped testing when MSIE's tests said "script did not run" so > > often. We probably need to test it differently, instead of relying on > > a script. > > Or maybe update your IE to a newer version? > > http://krijnhoetmer.nl/irc-logs/whatwg/20090817#l-604 has the results > for > IE8. In summary it seems IE8 only does whitespace trimming at start and > end and has ISO_8859-9 and ISO-8859_9 as alias but not ISO_8859_9. It > also > treats ISO-8859-9 as Windows-1254 which makes sense I suppose and we > should probably require that. > > I think the main issue with following the IE/Gecko algorithm is that > although it is much stricter it relies on more undefined aliases as > well, > such as ISO-8859_9. So getting documentation from the IE Team and Gecko > guys on that would be good. (Have not checked whether Gecko actually > recognizes that alias, fwiw.) > > > -- > Anne van Kesteren > http://annevankesteren.nl/ >
Attachments
- text/plain attachment: ie.encodings.txt
- text/plain attachment: iana.charsets.map.txt
Received on Wednesday, 9 September 2009 20:19:31 UTC