W3C home > Mailing lists > Public > public-html-comments@w3.org > September 2009

RE: charset name matching rules

From: Sylvain Galineau <sylvaing@microsoft.com>
Date: Wed, 9 Sep 2009 20:18:33 +0000
To: Anne van Kesteren <annevk@opera.com>
CC: "public-html-comments@w3.org" <public-html-comments@w3.org>
Message-ID: <045A765940533D4CA4933A4A7E32597E021EEE91@TK5EX14MBXC113.redmond.corp.microsoft.com>
Apologies for the delayed answer; I hope the following is helpful.

IE trims leading and trailing spaces from the encoding name then does a lowercase match on one of the aliases listed in ie.encodings.txt (attached).

ie.encodings.txt lists all the encodings using the format:

<ui-label>,<alias>,<codepage>,<msdn-cp-identifier>

Where:

<ui-label> is the encoding name as reported in the Page-Encoding menu of IE8 RTM (us-en version).
<alias> is an encoding name that maps to <ui-label>; the mapping is lowercase match of the input after trimming leading and trailing spaces.
<codepage> is the codepage number for this encoding.
<msdn-cp-identifier> is the description of the code page from http://msdn.microsoft.com/en-us/library/dd317756(VS.85).aspx


In addition, as it might be helpful for future spec work, I've also attached a flat version of the IANA character set assignments at http://www.iana.org/assignments/character-sets


The iana.charsets.map.txt file enumerates the charsets using the format:

<IANA-name>,<alias>



> -----Original Message-----
> From: public-html-comments-request@w3.org [mailto:public-html-comments-
> request@w3.org] On Behalf Of Anne van Kesteren
> Sent: Monday, August 17, 2009 12:33 PM
> To: Erik van der Poel
> Cc: Ian Hickson; public-html-comments@w3.org
> Subject: Re: charset name matching rules
> 
> On Mon, 17 Aug 2009 17:26:52 +0200, Erik van der Poel
> <erikv@google.com>
> wrote:
> > I stopped testing when MSIE's tests said "script did not run" so
> > often. We probably need to test it differently, instead of relying on
> > a script.
> 
> Or maybe update your IE to a newer version?
> 
> http://krijnhoetmer.nl/irc-logs/whatwg/20090817#l-604 has the results
> for
> IE8. In summary it seems IE8 only does whitespace trimming at start and
> end and has ISO_8859-9 and ISO-8859_9 as alias but not ISO_8859_9. It
> also
> treats ISO-8859-9 as Windows-1254 which makes sense I suppose and we
> should probably require that.
> 
> I think the main issue with following the IE/Gecko algorithm is that
> although it is much stricter it relies on more undefined aliases as
> well,
> such as ISO-8859_9. So getting documentation from the IE Team and Gecko
> guys on that would be good. (Have not checked whether Gecko actually
> recognizes that alias, fwiw.)
> 
> 
> --
> Anne van Kesteren
> http://annevankesteren.nl/

> 


Received on Wednesday, 9 September 2009 20:19:31 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 1 June 2011 00:14:00 GMT