Re: [whatwg/encoding] index-jis0208.txt should be JIS X 0208 and add another index file (#47)

I vote for splitting the table index-jis0208.txt into two parts, one for the indices < 8836 (the actual JIS X 0208 matrix) and one for the indices >= 8836 (the CP932 additions by Microsoft). Reasons:
- JIS X 0208 is a CCS based on rows and columns, with 94 rows and 94 columns.
- The description in section 6 "This is the JIS X 0208 standard including formerly proprietary extensions from IBM and NEC." is inaccurate.
- The part with indices >= 8836 is only meant to be used in the Shift_JIS and (possibly) ISO-2022-JP conversions, not in the EUC-JP conversion.
- In fact, it causes a bug in the EUC-JP encoder: When the input code-point is e.g. 0x2170, the EUC-JP encoder will set 'lead = 275' and 'trail = 161', thus attempt to return a byte with value > 255 ! Some implementations will just operate mod 256 and return the byte sequence 0x13 0xA1.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/47#issuecomment-250996367

Received on Sunday, 2 October 2016 21:12:14 UTC