W3C home > Mailing lists > Public > public-i18n-core@w3.org > January to March 2015

I18N-ISSUE-449 (BUG27878): handling of U+5341(and potentially other dupe points) is incompatible with Firefox, Chrome and IE 11 [encoding]

From: Internationalization Working Group Issue Tracker <sysbot+tracker@w3.org>
Date: Mon, 30 Mar 2015 14:24:56 +0000
To: public-i18n-core@w3.org
Message-Id: <E1Ycaca-00094y-Uz@maia.w3.org>
I18N-ISSUE-449 (BUG27878): handling of U+5341(and potentially other  dupe points) is incompatible with Firefox, Chrome and IE 11 [encoding]

http://www.w3.org/International/track/issues/449

Raised by: Richard Ishida
On product: encoding

https://www.w3.org/Bugs/Public/show_bug.cgi?id=27878

This issue tracks the bug listed above and was created as part of the WG CR process.

---

Reporter: jshin@chromium.org

Spun off from bug 16389 

Duplicate entries in index-*.txt is
http://lists.w3.org/Archives/Public/www-archive/2012Apr/0062.html


https://encoding.spec.whatwg.org/#index-pointer has the following:



The index pointer for code point in index is the first pointer corresponding to
code point in index, or null if code point is not in index.

And, the big5 encoder has the following steps:

3. Let pointer be the index pointer for code point in index big5.

4. If pointer is null, return error with code point.

....




Using the first pointer for round-trip while using others for decoding-only
(toUnicode) seems to lead to at least one discrepancy from Firefox 35, Chrome
and IE 11 in Big5. 

index-big5.txt has two entries for U+5341 as shown below: 

  5287   0x5341  十 (<CJK Ideograph>)
  5512   0x5341  十 (<CJK Ideograph>)

5287 corresponds to {0xA2 0xCC} and 5512 is {0xA4 0x51}. 

All three browsers above encode U+5341 to {0xA4 0x51} in Big5 instead of {0xA2
0xCC}.
Received on Monday, 30 March 2015 14:25:02 UTC

This archive was generated by hypermail 2.3.1 : Monday, 30 March 2015 14:25:03 UTC