[whatwg/encoding] Adopt GB18030-2022 (PR #336)

This implements the Unicode Technical Committee recommendation around GB18030-2022 in a matter suitable for this standard, taking into account existing practice and the closeness between GBK and gb18030.

In particular, using the text file attached to https://www.unicode.org/L2/L2023/23003r-gb18030-recommendations.pdf this does the following:

1. Merges the first set of 18 mappings, which are bidirectional, directly into index gb18030, replacing existing PUA entries. This ends up impacting GBK and gb18030.
2. The second set of 18 mappings (from PUA to bytes) are encoded as an encoder only table, for both GBK and gb18030.
3. The third set of 18 mappings (from bytes to code points) are ignored, as they are already covered by index gb18030 ranges. (Presumably they are included because the recommendation covers the transition from "Previous Mappings" to "Current Mappings" to "Recommended Mappings", whereas we are going directly from "Previous Mappings" to "Recommended Mappings".)

The reason for changing GBK as well is because Chromium and WebKit have already code in the wild that impacts GBK to some degree (although the encoder only table is excluded for GBK only at the moment, including that would make the most sense compatibility-wise) and no fallout has been recorded. Additionally GBK is already positioned as a rough subset of gb18030 in this standard, with the decoder being shared completely.

Tests: encoding/legacy-mb-schinese has some GB18030-2022 coverage already. The aim is to complete that with https://github.com/web-platform-tests/wpt/pull/48239 and https://github.com/web-platform-tests/wpt/pull/48240.

This supersedes #335. This fixes #27 and fixes #312.

<!--
Thank you for contributing to the Encoding Standard! Please describe the change you are making and complete the checklist below if your change is not editorial.
When editing this comment after the PR is created, check that PR-Preview doesn't overwrite your changes.
If you think your PR is ready to land, please double-check that the build is passing and the checklist is complete before pinging.
-->

- [x] At least two implementers are interested (and none opposed):
   * WebKit
   * Gecko (per chat with Henri)
- [x] [Tests](https://github.com/web-platform-tests/wpt) are written and can be reviewed and commented upon at:
   * See above.
- [ ] [Implementation bugs](https://github.com/whatwg/meta/blob/main/MAINTAINERS.md#handling-pull-requests) are filed:
   * Chromium: …
   * Gecko: …
   * WebKit: …
   * Deno: …
   * Node.js: …
- [x] [MDN issue](https://github.com/whatwg/meta/blob/main/MAINTAINERS.md#handling-pull-requests) is filed: N/A, too niche.
- [x] The top of this comment includes a [clear commit message](https://github.com/whatwg/meta/blob/main/COMMITTING.md) to use. <!-- If you created this PR from a single commit, Github copied its message. Otherwise, you need to add a commit message yourself. -->

(See [WHATWG Working Mode: Changes](https://whatwg.org/working-mode#changes) for more details.)


<!--
    This comment and the below content is programmatically generated.
    You may add a comma-separated list of anchors you'd like a
    direct link to below (e.g. #idl-serializers, #idl-sequence):

    Don't remove this comment or modify anything below this line.
    If you don't want a preview generated for this pull request,
    just replace the whole of this comment's content by "no preview"
    and remove what's below.
-->
***
<a href="https://whatpr.org/encoding/336.html" title="Last updated on Sep 18, 2024, 1:22 PM UTC (1d519bf)">Preview</a> | <a href="https://whatpr.org/encoding/336/e20f586...1d519bf.html" title="Last updated on Sep 18, 2024, 1:22 PM UTC (1d519bf)">Diff</a>
You can view, comment on, or merge this pull request online at:

  https://github.com/whatwg/encoding/pull/336

-- Commit Summary --

  * Adopt GB18030-2022

-- File Changes --

    M encoding.bs (71)
    M index-big5.txt (2)
    M index-euc-kr.txt (2)
    M index-gb18030-ranges.txt (2)
    M index-gb18030.txt (40)
    M index-ibm866.txt (2)
    M index-iso-2022-jp-katakana.txt (2)
    M index-iso-8859-10.txt (2)
    M index-iso-8859-13.txt (2)
    M index-iso-8859-14.txt (2)
    M index-iso-8859-15.txt (2)
    M index-iso-8859-16.txt (2)
    M index-iso-8859-2.txt (2)
    M index-iso-8859-3.txt (2)
    M index-iso-8859-4.txt (2)
    M index-iso-8859-5.txt (2)
    M index-iso-8859-6.txt (2)
    M index-iso-8859-7.txt (2)
    M index-iso-8859-8.txt (2)
    M index-jis0208.txt (2)
    M index-jis0212.txt (2)
    M index-koi8-r.txt (2)
    M index-koi8-u.txt (2)
    M index-macintosh.txt (2)
    M index-windows-1250.txt (2)
    M index-windows-1251.txt (2)
    M index-windows-1252.txt (2)
    M index-windows-1253.txt (2)
    M index-windows-1254.txt (2)
    M index-windows-1255.txt (2)
    M index-windows-1256.txt (2)
    M index-windows-1257.txt (2)
    M index-windows-1258.txt (2)
    M index-windows-874.txt (2)
    M index-x-mac-cyrillic.txt (2)
    M indexes.json (2)

-- Patch Links --

https://github.com/whatwg/encoding/pull/336.patch
https://github.com/whatwg/encoding/pull/336.diff

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/pull/336
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/encoding/pull/336@github.com>

Received on Wednesday, 18 September 2024 13:22:33 UTC