Re: [whatwg/encoding] Support GB18030-2022 (PR #335) from StoneChi8 on 2024-09-19 (public-webapps-github@w3.org from September 2024)

From: StoneChi8 <notifications@github.com>
Date: Wed, 18 Sep 2024 17:59:48 -0700
To: whatwg/encoding <encoding@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <whatwg/encoding/pull/335/review/2314125318@github.com>

@StoneChi8 commented on this pull request.

> + <p>If <a for="gb18030 encoder">is GBK</a> is false and there is a row in the table below whose
+ first column is <var>code point</var>, then return the two bytes on the same row listed in the
+ second column:

GBK-1995 never been an offical standard, although it asign some characters in U+E8xx for GB+FExx, and 52 Chinese character has been replaced by GB18030-2000 for unicode extension A, remained GB code GB+FExx unchanged, ex. GB+FE9F【䶮】 mapped to U+4DAE instead of U+E863. For information interchange, we should use official GB18030-2022 mapping table enven in GBK quotation, in order to drop these duplicate unicode code to those same GB 2 bytes character.
Windows CP 936 method is wrong way to reach the GB18030 standard, i.e. remained U+E8xx characters in ttf font file (Source Han Sans & iOS never done these)，convert program using these PUA characters and assign 0x3F to 4 bytes GB18030 characters.
On the other hand, full BMP PUA code range is U+E000~U+F8FF, and SMP U+10000~U+10FFFF mapping to GB18030 is a GB+91308130~ only, no mapping table need for programming and future GB18030 amendments.
See detail in https://zhuanlan.zhihu.com/p/661610604 for WAHTWG GB18030 convert program(in Chinese).

--
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/pull/335#discussion_r1765954742
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/encoding/pull/335/review/2314125318@github.com>

Received on Thursday, 19 September 2024 00:59:52 UTC