- From: Yonggang Luo <notifications@github.com>
- Date: Sun, 17 Jan 2016 08:13:48 -0800
- To: whatwg/encoding <encoding@noreply.github.com>
- Message-ID: <whatwg/encoding/issues/27@github.com>
Cause GB18030-2005 is already one-to-one mapping bettween Unicode & GBK18030 except The 14 characters that still mapped into Unicode PUA that at 2005, But nowadays, all the 14 characters have correlated mapping into Unicode, So I suggest encoding standard mapping those characters to normal Unicode characters but PUA characters. The following 80 characters are the GBK chracters that ever mapped to Unicode PUA, and the corresponding Unicode non-PUA character ``` Han Character GBK Unicode PUA Unicode non-PUA FE50 E815 2E81 FE51 E816 20087 FE52 E817 20089 FE53 E818 200CC FE54 E819 2E84 FE55 E81A 3473 FE56 E81B 3447 FE57 E81C 2E88 FE58 E81D 2E8B FE59 E81E 9FB4 FE5A E81F 359E FE5B E820 361A FE5C E821 360E FE5D E822 2E8C FE5E E823 2E97 FE5F E824 396E FE60 E825 3918 FE61 E826 9FB5 FE62 E827 39CF FE63 E828 39DF FE64 E829 3A73 FE65 E82A 39D0 FE66 E82B 9FB6 FE67 E82C 9FB7 FE68 E82D 3B4E FE69 E82E 3C6E FE6A E82F 3CE0 FE6B E830 2EA7 FE6C E831 215D7 FE6D E832 9FB8 FE6E E833 2EAA FE6F E834 4056 FE70 E835 415F FE71 E836 2EAE FE72 E837 4337 FE73 E838 2EB3 FE74 E839 2EB6 FE75 E83A 2EB7 FE76 E83B 2298F FE77 E83C 43B1 FE78 E83D 43AC FE79 E83E 2EBB FE7A E83F 43DD FE7B E840 44D6 FE7C E841 4661 FE7D E842 464C FE7E E843 9FB9 FE80 E844 4723 FE81 E845 4729 FE82 E846 477C FE83 E847 478D FE84 E848 2ECA FE85 E849 4947 FE86 E84A 497A FE87 E84B 497D FE88 E84C 4982 FE89 E84D 4983 FE8A E84E 4985 FE8B E84F 4986 FE8C E850 499F FE8D E851 499B FE8E E852 49B7 FE8F E853 49B6 FE90 E854 9FBA FE91 E855 241FE FE92 E856 4CA3 FE93 E857 4C9F FE94 E858 4CA0 FE95 E859 4CA1 FE96 E85A 4C77 FE97 E85B 4CA2 FE98 E85C 4D13 FE99 E85D 4D14 FE9A E85E 4D15 FE9B E85F 4D16 FE9C E860 4D17 FE9D E861 4D18 FE9E E862 4D19 FE9F E863 4DAE FEA0 E864 9FBB ``` The following 14 characters are the GB18030-2005 chracters that are still mapped to Unicode PUA, and I suggest the encoding standard mapping those characters into Unicode non-PUA, cause we have no need to waiting GB18030 to update it's spec just for those 14 chracters, and we could be sure those 14 chracters's corresponding Unicode non-PUA characters are decided. ``` Han Character GBK Unicode PUA Unicode non-PUA FE51 E816 20087 FE52 E817 20089 FE53 E818 200CC FE59 E81E 9FB4 FE61 E826 9FB5 FE66 E82B 9FB6 FE67 E82C 9FB7 FE6C E831 215D7 FE6D E832 9FB8 FE76 E83B 2298F FE7E E843 9FB9 FE90 E854 9FBA FE91 E855 241FE FEA0 E864 9FBB ``` And according to these, we can decode all GBK encoding family strings to non-PUA Unicode, Besides these, we still have the need to convert all the historical Unicode PUA characters to proper GBK(GB18030) characters. --- Reply to this email directly or view it on GitHub: https://github.com/whatwg/encoding/issues/27
Received on Sunday, 17 January 2016 16:14:17 UTC