Re: [whatwg/encoding] Support GB18030-2022 (PR #335)

@StoneChi8 commented on this pull request.



> +  <p>If <a for="gb18030 encoder">is GBK</a> is false and there is a row in the table below whose
+  first column is <var>code point</var>, then return the two bytes on the same row listed in the
+  second column:

GBK-1995 never been an offical standard, although it asign some characters in U+E8xx for GB+FExx, and 52 Chinese character has been replaced by GB18030-2000 for unicode extension A, remained GB code GB+FExx unchanged, ex. GB+FE9F【䶮】 mapped to U+4DAE instead of U+E863.   For information interchange, we should use official GB18030-2022 mapping table enven in GBK quotation, in order to drop these duplicate unicode code to those same GB 2 bytes character.
Windows CP 936 method is wrong way to reach the GB18030 standard, i.e. remained U+E8xx characters in ttf font file (Source Han Sans & iOS never done these),convert program using these PUA characters and assign 0x3F to 4 bytes GB18030 characters. 
On the other hand, full BMP PUA code range is U+E000~U+F8FF, and SMP U+10000~U+10FFFF mapping to GB18030 is a GB+91308130~ only, no mapping table need for programming and future GB18030 amendments.  
See detail in https://zhuanlan.zhihu.com/p/661610604 for WAHTWG GB18030 convert program(in Chinese).

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/pull/335#discussion_r1765954742
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/encoding/pull/335/review/2314125318@github.com>

Received on Thursday, 19 September 2024 00:59:52 UTC