Re: [whatwg/encoding] If gb18030 is revised, consider aligning the Encoding Standard (#27)

Right. I was merely showing one possible way in which China may change GB 18030 to remove the PUA requirement, by applying the pattern that was used in the 2005 update. The single mapping change in the 2005 update may have been one-off–ish enough that China figured it would be harmless, but 24 mapping changes may be a bit much to swallow at once.

The history of GB 18030 goes back to GBK, which included significantly more PUA mappings, a little over 100. The ones that could be changed to non-PUA mappings were changed, and only 25 remained in GB 18030-2000, in terms of the "required" portion.

The other way to handle to remove the PUA requirement, to keep the mapping stable, is to first remove the requirement to support the following 24 characters:

0xA6D9 -> U+E78D
0xA6DA -> U+E78E
0xA6DB -> U+E78F
0xA6DC -> U+E790
0xA6DD -> U+E791
0xA6DE -> U+E792
0xA6DF -> U+E793
0xA6EC -> U+E794
0xA6ED -> U+E795
0xA6F3 -> U+E796
0xFE51 -> U+E816
0xFE52 -> U+E817
0xFE53 -> U+E818
0xFE59 -> U+E81E
0xFE61 -> U+E826
0xFE66 -> U+E82B
0xFE67 -> U+E82C
0xFE6C -> U+E831
0xFE6D -> U+E832
0xFE76 -> U+E83B
0xFE7E -> U+E843
0xFE90 -> U+E854
0xFE91 -> U+E855
0xFEA0 -> U+E864

And second, to require the following 24 characters:

0x82359037 -> U+9FB4
0x82359038 -> U+9FB5
0x82359039 -> U+9FB6
0x82359130 -> U+9FB7
0x82359131 -> U+9FB8
0x82359132 -> U+9FB9
0x82359133 -> U+9FBA
0x82359134 -> U+9FBB
0x84318236 -> U+FE10
0x84318237 -> U+FE11
0x84318238 -> U+FE12
0x84318239 -> U+FE13
0x84318330 -> U+FE14
0x84318331 -> U+FE15
0x84318332 -> U+FE16
0x84318333 -> U+FE17
0x84318334 -> U+FE18
0x84318335 -> U+FE19
0x95329031 -> U+20087
0x95329033 -> U+20089
0x95329730 -> U+200CC
0x9536B937 -> U+215D7
0x9630BA35 -> U+2298F
0x9635B630 -> U+241FE

My guess is that the original 24 characters, in terms of supporting their mappings, will be changed from "required" to "optional," and that the additional 24 characters will be changed from "optional" to "required" if the original 24 characters are not supported. Or, something to that effect.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/27#issuecomment-288064485

Received on Tuesday, 21 March 2017 12:33:17 UTC