- From: Masayoshi Takahashi <notifications@github.com>
- Date: Mon, 02 May 2016 09:15:25 -0700
- To: whatwg/encoding <encoding@noreply.github.com>
- Cc:
- Message-ID: <whatwg/encoding/issues/47@github.com>
The CCS(coded character set) of index-jis0208.txt is CCS of "CP932" or "Windows-31J". It is not JIS X 0208. There are two differences: * CP932 has more characters than JIS X 0208. * All CCS in JIS X 0208 has its name, for mapping to UCS. There are some difference between it and index-jis0208.txt. The important role of JIS X 0208 is restriction of character set. For example, some fonts in Japan have characters only in JIS X 0208, not CP932. So we need a strict character set to implement converter like Shift_JIS encoder. We need another index which is differ from index-jis0208.txt. Another usecase: sometimes I want to convert Shift_JIS text into EPUB file. It's typical usecase of Shift_JIS decoder. When I do it, I want to convert EM DASH in Shift_JIS (0x815C, 1-1-29) to EM DASH in Unicode (U+2014). It's OK in JIS X 0208 (and JIS X 0213), but It's NG in CP932; the table of index-jis0208.txt maps EM DASH into HORIZONTAL BAR(U+2015). It's not what I want to do. So my suggestions is: * index-jis0208.txt should be renamed, such as index-cp932 or index-windows31j, * and should add another index same as JIS X 0208. I have machine-readable mapping table, "JIS X 0213:2004 8-bit code vs Unicode mapping table" http://x0213.org/codetable/jisx0213-2004-8bit-std.txt. But It's JIS X 0213, not JIS X 0208. I found the issue #31 and I also don't have a real usecase of SHIFT_JISX0213 nor Shift_JIS-2004 (when I want to use characters in JIS X 0213, I always use UTF-8). I need just only JIS X 0208, subset of jisx0213-2004-8bit-std.txt. --- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/whatwg/encoding/issues/47
Received on Monday, 2 May 2016 16:15:53 UTC