- From: Anne van Kesteren <annevk@annevk.nl>
- Date: Mon, 16 Dec 2013 15:48:34 +0000
- To: www-archive <www-archive@w3.org>
- Message-ID: <CADnb78gvSwqAdAAO6vU0=ntYDTo7HBzcRKaC2nTq0D_CuCNSOQ@mail.gmail.com>
A year and a half ago I compiled http://lists.w3.org/Archives/Public/www-archive/2012Apr/0030.html using http://dump.testsuite.org/encoding/gbk/ and some basic Python scripts to analyze the output. (Lack of Internet Explorer is due to lack of XMLHttpRequest's overrideMimeType support there.) However, gb18030 is supposed to be a UTF and in Rebel Opera it is not. Back then I did not take this as a hard requirement, but it leads to problems such as https://www.w3.org/Bugs/Public/show_bug.cgi?id=21145 and might in fact violate some Chinese government regulations depending on who you ask. gb18030 data is the same between Gecko and Chrome. Where gbk differs from gb18030 in Gecko, the byte sequences is mapped to U+FFFD. In Chrome, a PUA mapping is used instead, as illustrated below: Index Chrome gb18030 gbk 6432 20AC E76C 7536 01F9 E7C8 7672 303E E7E7 7673 2FF0 E7E8 7674 2FF1 E7E9 7675 2FF2 E7EA 7676 2FF3 E7EB 7677 2FF4 E7EC 7678 2FF5 E7ED 7679 2FF6 E7EE 7680 2FF7 E7EF 7681 2FF8 E7F0 7682 2FF9 E7F1 7683 2FFA E7F2 7684 2FFB E7F3 23766 2E81 E815 23770 2E84 E819 23771 3473 E81A 23772 3447 E81B 23773 2E88 E81C 23774 2E8B E81D 23776 359E E81F 23777 361A E820 23778 360E E821 23779 2E8C E822 23780 2E97 E823 23781 396E E824 23782 3918 E825 23784 39CF E827 23785 39DF E828 23786 3A73 E829 23787 39D0 E82A 23790 3B4E E82D 23791 3C6E E82E 23792 3CE0 E82F 23793 2EA7 E830 23796 2EAA E833 23797 4056 E834 23798 415F E835 23799 2EAE E836 23800 4337 E837 23801 2EB3 E838 23802 2EB6 E839 23803 2EB7 E83A 23805 43B1 E83C 23806 43AC E83D 23807 2EBB E83E 23808 43DD E83F 23809 44D6 E840 23810 4661 E841 23811 464C E842 23813 4723 E844 23814 4729 E845 23815 477C E846 23816 478D E847 23817 2ECA E848 23818 4947 E849 23819 497A E84A 23820 497D E84B 23821 4982 E84C 23822 4983 E84D 23823 4985 E84E 23824 4986 E84F 23825 499F E850 23826 499B E851 23827 49B7 E852 23828 49B6 E853 23831 4CA3 E856 23832 4C9F E857 23833 4CA0 E858 23834 4CA1 E859 23835 4C77 E85A 23836 4CA2 E85B 23837 4D13 E85C 23838 4D14 E85D 23839 4D15 E85E 23840 4D16 E85F 23841 4D17 E860 23842 4D18 E861 23843 4D19 E862 23844 4DAE E863 Given the differences among browsers for these 81 mappings it seems safe to use the gb18030 mapping universally and even turn gbk into a label for gb18030. Note that the indexes are in line with what http://encoding.spec.whatwg.org/is using. -- http://annevankesteren.nl/
Received on Monday, 16 December 2013 15:49:05 UTC