- From: Bruno Haible <notifications@github.com>
- Date: Sun, 02 Oct 2016 14:11:46 -0700
- To: whatwg/encoding <encoding@noreply.github.com>
Received on Sunday, 2 October 2016 21:12:14 UTC
I vote for splitting the table index-jis0208.txt into two parts, one for the indices < 8836 (the actual JIS X 0208 matrix) and one for the indices >= 8836 (the CP932 additions by Microsoft). Reasons: - JIS X 0208 is a CCS based on rows and columns, with 94 rows and 94 columns. - The description in section 6 "This is the JIS X 0208 standard including formerly proprietary extensions from IBM and NEC." is inaccurate. - The part with indices >= 8836 is only meant to be used in the Shift_JIS and (possibly) ISO-2022-JP conversions, not in the EUC-JP conversion. - In fact, it causes a bug in the EUC-JP encoder: When the input code-point is e.g. 0x2170, the EUC-JP encoder will set 'lead = 275' and 'trail = 161', thus attempt to return a byte with value > 255 ! Some implementations will just operate mod 256 and return the byte sequence 0x13 0xA1. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/whatwg/encoding/issues/47#issuecomment-250996367
Received on Sunday, 2 October 2016 21:12:14 UTC