W3C home > Mailing lists > Public > whatwg@whatwg.org > March 2008

[whatwg] A comment to character encoding declaration

From: Henri Sivonen <hsivonen@iki.fi>
Date: Wed, 19 Mar 2008 17:45:54 +0200
Message-ID: <B7AD3B49-0828-4D0B-82EA-1B1D70F2DE53@iki.fi>
On Mar 7, 2008, at 10:12, Jjgod Jiang wrote:
> On Fri, 7 Mar 2008, Alexey Proskuryakov wrote:
>>  In my testing, it appears that IE 7 and Firefox 2 do treat GBK as  
>> an equivalent of GB2312, but this cannot be said about GB18030. In  
>> particular, 0x80 and 0xA2E3 are treated differently.
> Yep, I missed that point in my previous post, my fault. Yes, they
> should be treated differently. So I guess my request should be changed
> to only treat GB2312 as GBK.

According to source code[1], WebKit trunk also changes GB_2312-80 to  
GBK. Gecko aliases gb_2312-80 to GB2312 (due to FrontPage output  
according to source comment).

Also, WebKit changes KS_C_5601-1987 and EUC-KR to windows-949-2000.  
Gecko aliases[2] KS_C_5601-1987 to x-windows-949 (due to FrontPage  
output according to source comment). However, Gecko doesn't use its  
alias mechanism to alias EUC-KR to windows-949. I haven't tested if  
EUC-KR is treated equivalently to windows-949 by other means.

Yet another weird alias tidbit supported both by Gecko and WebKit  
source as well as Googling the subject:
Looks like x-x-big5 needs to be an alias for Big5 due to FrontPage  

[1] http://trac.webkit.org/projects/webkit/browser/trunk/WebCore/platform/text/TextCodecICU.cpp#L90
[2] http://mxr.mozilla.org/seamonkey/source/intl/uconv/src/charsetalias.properties#335
Henri Sivonen
hsivonen at iki.fi
Received on Wednesday, 19 March 2008 08:45:54 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 22 January 2020 16:59:01 UTC