[whatwg/encoding] Inform readers about the structure of the sparsity of index-euc-kr (#78)

index-euc-kr is rather sparse. That is, it has a lot of pointer space without corresponding characters.

To avoid making implementors ship bloated software and to avoid making implementors to work out the pattern of the index holes on their own, it would be courteous to add an informative note about the pattern of the holes in the index.

These holes are easier to see in [bestfit949.txt](http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit949.txt).

Visually (I haven't verified programmatically), when lead is 0xC7 or greater, only trail bytes 0xA1...0xFE (inclusive) map to characters using only 94 slots of the 190 in each stride. When lead is less than 0xC7, trail bytes 0x5B...0x60 (inclusive) and 0x7B...0x80 (inclusive) are not used using only up to 178 slots of 190 in each stride.

Some strides (leads in the range 0xA2...0xAF) have even fewer than 178 slots used, but I haven't yet worked out if there's a pattern to those.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/78

Received on Sunday, 30 October 2016 13:17:03 UTC