W3C home > Mailing lists > Public > www-international@w3.org > April to June 2014

[Bug 25396] New: Incorrect mapping in index18030.txt

From: <bugzilla@jessica.w3.org>
Date: Sun, 20 Apr 2014 04:56:43 +0000
To: www-international@w3.org
Message-ID: <bug-25396-4285@http.www.w3.org/Bugs/Public/>
https://www.w3.org/Bugs/Public/show_bug.cgi?id=25396

            Bug ID: 25396
           Summary: Incorrect mapping in index18030.txt
           Product: WHATWG
           Version: unspecified
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Encoding
          Assignee: annevk@annevk.nl
          Reporter: ashtuchkin@gmail.com
        QA Contact: sideshowbarker+encodingspec@gmail.com
                CC: mike@w3.org, www-international@w3.org

Input sequence A3 A0 in GB18030 is decoded as U+E5E5 by iconv and ICU. F.ex. 

> printf "\xA3\xA0" | iconv -f gb18030 -t utf-16le | hexdump
0000000 e5 e5

ICU table:
http://source.icu-project.org/repos/icu/data/trunk/charset/data/xml/gb-18030-2000.xml

Using the algorithm given in http://encoding.spec.whatwg.org/#gb18030-encoder, 
A3 A0 results in pointer 6555, which is mapped to U+3000 IDEOGRAPHIC SPACE in
index18030.txt.

I believe this mapping incorrect and should be replaced with U+E5E5.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Received on Sunday, 20 April 2014 04:56:44 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 20 April 2014 04:56:45 UTC