- From: StoneChi8 <notifications@github.com>
- Date: Wed, 18 Sep 2024 17:59:48 -0700
- To: whatwg/encoding <encoding@noreply.github.com>
- Cc: Subscribed <subscribed@noreply.github.com>
- Message-ID: <whatwg/encoding/pull/335/review/2314125318@github.com>
@StoneChi8 commented on this pull request. > + <p>If <a for="gb18030 encoder">is GBK</a> is false and there is a row in the table below whose + first column is <var>code point</var>, then return the two bytes on the same row listed in the + second column: GBK-1995 never been an offical standard, although it asign some characters in U+E8xx for GB+FExx, and 52 Chinese character has been replaced by GB18030-2000 for unicode extension A, remained GB code GB+FExx unchanged, ex. GB+FE9F【䶮】 mapped to U+4DAE instead of U+E863. For information interchange, we should use official GB18030-2022 mapping table enven in GBK quotation, in order to drop these duplicate unicode code to those same GB 2 bytes character. Windows CP 936 method is wrong way to reach the GB18030 standard, i.e. remained U+E8xx characters in ttf font file (Source Han Sans & iOS never done these),convert program using these PUA characters and assign 0x3F to 4 bytes GB18030 characters. On the other hand, full BMP PUA code range is U+E000~U+F8FF, and SMP U+10000~U+10FFFF mapping to GB18030 is a GB+91308130~ only, no mapping table need for programming and future GB18030 amendments. See detail in https://zhuanlan.zhihu.com/p/661610604 for WAHTWG GB18030 convert program(in Chinese). -- Reply to this email directly or view it on GitHub: https://github.com/whatwg/encoding/pull/335#discussion_r1765954742 You are receiving this because you are subscribed to this thread. Message ID: <whatwg/encoding/pull/335/review/2314125318@github.com>
Received on Thursday, 19 September 2024 00:59:52 UTC