[whatwg/encoding] index-jis0208.txt should be JIS X 0208 and add another index file (#47)

The CCS(coded character set) of index-jis0208.txt is CCS of "CP932" or "Windows-31J".  It is not JIS X 0208.
There are two differences:

* CP932 has more characters than JIS X 0208.
* All CCS in JIS X 0208 has its name, for mapping to UCS.  There are some difference between it and index-jis0208.txt.

The important role of JIS X 0208 is restriction of character set.  For example, some fonts in Japan have characters only in JIS X 0208, not CP932.  So we need a strict character set to implement converter like Shift_JIS encoder.  We need another index which is differ from index-jis0208.txt.

Another usecase: sometimes I want to convert Shift_JIS text into EPUB file. It's typical usecase of Shift_JIS decoder.  When I do it, I want to convert EM DASH in Shift_JIS (0x815C, 1-1-29) to EM DASH in Unicode (U+2014).  It's OK in JIS X 0208 (and JIS X 0213), but It's NG in CP932; the table of index-jis0208.txt maps EM DASH into HORIZONTAL BAR(U+2015).  It's not what I want to do.

So my suggestions is:
* index-jis0208.txt should be renamed, such as index-cp932 or index-windows31j,
* and should add another index same as JIS X 0208.

I have machine-readable mapping table, "JIS X 0213:2004 8-bit code vs Unicode mapping table"
http://x0213.org/codetable/jisx0213-2004-8bit-std.txt. But It's JIS X 0213, not JIS X 0208.
I found the issue #31 and I also don't have a real usecase of SHIFT_JISX0213 nor Shift_JIS-2004 (when I want to use characters in JIS X 0213, I always use UTF-8).  I need just only JIS X 0208, subset of jisx0213-2004-8bit-std.txt.


---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/47

Received on Monday, 2 May 2016 16:15:53 UTC