Re: [whatwg/encoding] If gb18030 is revised, consider aligning the Encoding Standard (#27)

GB18030-2022 will take effect on 1 Aug 2023. Compliance criteria include, at a minimum, not generating PUA characters for the 24 characters for input methods, and not using the 24 PUA codepoints for fonts.

However, most existing products sold on the Chinese market fail these tests and those old versions will still be expected to be used, even though they will no longer be allowed to be sold. Also there's existing UTF-8 content which are using those PUA codepoints.

To be backwards compatible, both the PUA and the non-PUA codepoints should map to the correct GB18030-2022 2-byte sequences.

Whether or not the 4-byte sequences should map to the non-PUA codepoints is less of an issue -- it is not expected that there be data in GB18030 that are stored in the 4-byte form. However, if keeping the double mapping to U+3000 is deemed web compatible, then keeping the 4-byte sequences mapped to the non-PUA codepoints should also be web compatible in the same manner.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/27#issuecomment-1294981537
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/encoding/issues/27/1294981537@github.com>

Received on Friday, 28 October 2022 13:10:12 UTC