- From: Anne van Kesteren <annevk@annevk.nl>
- Date: Mon, 16 Dec 2013 15:48:34 +0000
- To: www-archive <www-archive@w3.org>
- Message-ID: <CADnb78gvSwqAdAAO6vU0=ntYDTo7HBzcRKaC2nTq0D_CuCNSOQ@mail.gmail.com>
A year and a half ago I compiled
http://lists.w3.org/Archives/Public/www-archive/2012Apr/0030.html using
http://dump.testsuite.org/encoding/gbk/ and some basic Python scripts to
analyze the output. (Lack of Internet Explorer is due to lack of
XMLHttpRequest's overrideMimeType support there.)
However, gb18030 is supposed to be a UTF and in Rebel Opera it is not. Back
then I did not take this as a hard requirement, but it leads to problems
such as https://www.w3.org/Bugs/Public/show_bug.cgi?id=21145 and might in
fact violate some Chinese government regulations depending on who you ask.
gb18030 data is the same between Gecko and Chrome. Where gbk differs from
gb18030 in Gecko, the byte sequences is mapped to U+FFFD. In Chrome, a PUA
mapping is used instead, as illustrated below:
Index Chrome
gb18030 gbk
6432 20AC E76C
7536 01F9 E7C8
7672 303E E7E7
7673 2FF0 E7E8
7674 2FF1 E7E9
7675 2FF2 E7EA
7676 2FF3 E7EB
7677 2FF4 E7EC
7678 2FF5 E7ED
7679 2FF6 E7EE
7680 2FF7 E7EF
7681 2FF8 E7F0
7682 2FF9 E7F1
7683 2FFA E7F2
7684 2FFB E7F3
23766 2E81 E815
23770 2E84 E819
23771 3473 E81A
23772 3447 E81B
23773 2E88 E81C
23774 2E8B E81D
23776 359E E81F
23777 361A E820
23778 360E E821
23779 2E8C E822
23780 2E97 E823
23781 396E E824
23782 3918 E825
23784 39CF E827
23785 39DF E828
23786 3A73 E829
23787 39D0 E82A
23790 3B4E E82D
23791 3C6E E82E
23792 3CE0 E82F
23793 2EA7 E830
23796 2EAA E833
23797 4056 E834
23798 415F E835
23799 2EAE E836
23800 4337 E837
23801 2EB3 E838
23802 2EB6 E839
23803 2EB7 E83A
23805 43B1 E83C
23806 43AC E83D
23807 2EBB E83E
23808 43DD E83F
23809 44D6 E840
23810 4661 E841
23811 464C E842
23813 4723 E844
23814 4729 E845
23815 477C E846
23816 478D E847
23817 2ECA E848
23818 4947 E849
23819 497A E84A
23820 497D E84B
23821 4982 E84C
23822 4983 E84D
23823 4985 E84E
23824 4986 E84F
23825 499F E850
23826 499B E851
23827 49B7 E852
23828 49B6 E853
23831 4CA3 E856
23832 4C9F E857
23833 4CA0 E858
23834 4CA1 E859
23835 4C77 E85A
23836 4CA2 E85B
23837 4D13 E85C
23838 4D14 E85D
23839 4D15 E85E
23840 4D16 E85F
23841 4D17 E860
23842 4D18 E861
23843 4D19 E862
23844 4DAE E863
Given the differences among browsers for these 81 mappings it seems safe to
use the gb18030 mapping universally and even turn gbk into a label for
gb18030.
Note that the indexes are in line with what
http://encoding.spec.whatwg.org/is using.
--
http://annevankesteren.nl/
Received on Monday, 16 December 2013 15:49:05 UTC