Re: [whatwg/encoding] Inform readers about the structure of the sparsity of index-euc-kr (#78) from Henri Sivonen on 2017-01-12 (public-webapps-github@w3.org from January 2017)

From: Henri Sivonen <notifications@github.com>
Date: Thu, 12 Jan 2017 04:02:14 -0800
To: whatwg/encoding <encoding@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <whatwg/encoding/issues/78/272146498@github.com>

> My hypothesis is that the extended areas are even sorted in Unicode order, but I haven't verified that, yet.

They are. Both in the additional Hangul EUC-KR and the additional Unified Ideographs in gb18030.

Additionally, the original KS X 1001 Hangul block is also in Unicode order.

As for resolution of this issue, I suggest the following:

* Including the visualizations I created (or something similar) as informative material next to the spec ("next to" in the sense the txt files are next to the spec but not pasted into the spec itself). (If this is editorially OK as an idea, I can ask Gerv about CC0ing the script I wrote. The fonts I used are all under OFL 1.1.)

* Adding informative notes containing the the observations above about the Unicode ordering of the three Hangul blocks and the top and left Unified Ideograph blocks in gb18030.

* Noting that the three Hangul block taken together cover all of Hangul in Unicode.

* Noting that the Unified Ideograph blocks in gb18030 taken together cover all of the CJK Unified Ideographs block in Unicode.

* Adding notes about the relationship between Unicode order and the order of Level 2 Hanzi in gb18030 (the block whose top left corner is on row 0x2F column 0x60 in the visualization) and Level 2 Kanji in jis0208 (pointers from 4418 to 7801, inclusive). I haven't really figured out what exactly to say here. Level 2 Hanzi seems mostly Unicode-ordered but not quite. I haven't had time to examine whether Level 2 Kanji is Unicode-ordered but according to Lunde it is ordered first by radical and then by stroke like Unified Ideographs in Unicode.

--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/78#issuecomment-272146498

Received on Thursday, 12 January 2017 12:02:46 UTC