Re: [whatwg/encoding] Inform readers about the structure of the sparsity of index-euc-kr (#78) from Henri Sivonen on 2017-01-02 (public-webapps-github@w3.org from January 2017)

From: Henri Sivonen <notifications@github.com>
Date: Mon, 02 Jan 2017 00:46:36 -0800
To: whatwg/encoding <encoding@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <whatwg/encoding/issues/78/269945615@github.com>

I made some [visualizations of the indices](https://hsivonen.fi/encoding-visualization/).

The [visualization of the EUC-KR index](https://hsivonen.fi/encoding-visualization/euc-kr.html) makes the structure very clear. It's clear that there is the old index starting at U+3000 (row 32 decimal, column 96 decimal), and the parts above and to the left are later extensions that contain only Hangul not included in the original index. Additionally, looking at the [BMP coverage of EUC-KR](https://hsivonen.fi/encoding-visualization/euc-kr-bmp.html), taking the old and new together, they cover a contiguous range of Hangul in the BMP. (My hypothesis is that the extended areas are even sorted in Unicode order, but I haven't verified that, yet.)

FWIW, it seems that [gb18030](https://hsivonen.fi/encoding-visualization/gb18030.html) exhibits the same pattern for extension but for Hanzi: Old index starting at U+3000 and additional (Unicode-ordered?) Hanzi above and to the left taken together with the old forms [contiguous coverage of Unicode](https://hsivonen.fi/encoding-visualization/gb18030-bmp.html).

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/78#issuecomment-269945615

Received on Monday, 2 January 2017 08:47:11 UTC