- From: <bugzilla@jessica.w3.org>
- Date: Tue, 20 Jan 2015 18:54:26 +0000
- To: www-international@w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=27868 Bug ID: 27868 Summary: EUC-KR and decoding-only mapping Product: WHATWG Version: unspecified Hardware: PC OS: All Status: NEW Severity: normal Priority: P2 Component: Encoding Assignee: annevk@annevk.nl Reporter: jshin@chromium.org QA Contact: sideshowbarker+encodingspec@gmail.com CC: mike@w3.org, www-international@w3.org When I compared the mapping of EUC-KR in the encoding spec with ICU's Windows-949 [1] (that was obtained by scraping *one of Windows' converters*, I found the following differences: 1. ICU's Windows-949 mapping has 395 'decoding only' (from Unicode to windows-949) entries for characters like 'Currency Sign cent' (U+00A2, U+00A3), regular Latin/Greek/Cyrillic letters, and Hangul Conjoining Jamos (U+11xx), Hangul half-width jamos (U+FFxx), enclosed CJK characters (e.g. U+32xx ) etc. 2. ICU's Windows-949 has 190 additional round-trip mapping entries. Most of them (188 of them) are for the two user-defined blocks in KS X 1001 (in EUC-KR, "C9 [A1-FE]" and "FE [A1-FE]") that are mapped to PUA code points (U+E000 - U+E0BB). The remaining two are U+0080 and U+F8F7 mapped to 0x80 and 0xFF. I don't think that we want to support the two user-defined blocks in KS X 1001. I'm not sure about U+0080 and U+F8F7. However, I believe that quite many (NOT all) of 'decoding only' entries had better be supported. [1] https://code.google.com/p/chromium/codesearch#chromium/src/third_party/icu/source/data/mappings/windows-949-2000.ucm&q=windows-949-2000.ucm&sq=package:chromium&type=cs -- You are receiving this mail because: You are on the CC list for the bug.
Received on Tuesday, 20 January 2015 18:54:29 UTC