- From: <bugzilla@jessica.w3.org>
- Date: Wed, 18 Mar 2015 21:13:31 +0000
- To: www-international@w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=28141 --- Comment #4 from Jungshik Shin <jshin@chromium.org> --- ICU treats an 'illegal' byte sequence differently from a byte sequence 'unassigned' to a Unicode character. For instance, in EUC-KR (windows-949), <FE A1> is a valid byte sequence, but is not assigned any character. So, the sequence as a whole is turned to U+FFFD. Without tightening the vaild trail byte range for EUC-KR [1], <FE 41> is a valid byte sequence and is converted to U+FFFD (exactly the same treatment as <FE A1>). OTOH, <FE 22> has an illegal trail byte (because 0x22 is outside the trail byte range for EUC-KR/Windows-949) and is turned to <U+FFFD, U+0022> The same is true of Shift_JIS. Because [80-FC] is the valid trail byte range, <EB 9F> is turned to U+FFFD (there's no mapped character at this position) instead of <U+FFFD> being emitted and '0x9F' being added back to the stream [1] Blink is just tightening up the valid trail byte range so that 'x41' will not be valid any more if lead is C8 or higher. -- You are receiving this mail because: You are on the CC list for the bug.
Received on Wednesday, 18 March 2015 21:13:32 UTC