[whatwg] Encoding: big5 and big5-hkscs

On Fri, 06 Apr 2012 15:42:26 +0200, Philip J?genstedt <philipj at opera.com>  
wrote:

> These are the ranges that need more investigation.

Sorry for the monologue, but investigate I did. These are the interesting  
ones:

C6CF =>
opera-hk: U+FFFD ?
firefox: U+5EF4 ?
chrome: U+F6DF ?
firefox-hk: U+5EF4 ?
opera: U+2F35 ?
chrome-hk: U+FFFD ?
internetexplorer: U+F6DF ?

C6D3 =>
opera-hk: U+FFFD ?
firefox: U+65E0 ?
chrome: U+F6E3 ?
firefox-hk: U+65E0 ?
opera: U+2F46 ?
chrome-hk: U+FFFD ?
internetexplorer: U+F6E3 ?

C6D5 =>
opera-hk: U+FFFD ?
firefox: U+7676 ?
chrome: U+F6E5 ?
firefox-hk: U+7676 ?
opera: U+2F68 ?
chrome-hk: U+FFFD ?
internetexplorer: U+F6E5 ?

C6D7 =>
opera-hk: U+FFFD ?
firefox: U+96B6 ?
chrome: U+F6E7 ?
firefox-hk: U+96B6 ?
opera: U+2FAA ?
chrome-hk: U+FFFD ?
internetexplorer: U+F6E7 ?

C6DE =>
opera-hk: U+FFFD ?
firefox: U+3003 ?
chrome: U+F6EE ?
firefox-hk: U+3003 ?
opera: U+3003 ?
chrome-hk: U+FFFD ?
internetexplorer: U+F6EE ?

C6DF =>
opera-hk: U+FFFD ?
firefox: U+4EDD ?
chrome: U+F6EF ?
firefox-hk: U+4EDD ?
opera: U+4EDD ?
chrome-hk: U+FFFD ?
internetexplorer: U+F6EF ?

The first 4 were Opera using a compatibility code point instead of the  
canonical one. The final 2 are PUA vs proper, at least they render the  
same on my computer. In all 6 cases, firefox and firefox-hk are correct.

I manually added the above 6 mappings and the 4 multi-code point mappings  
 from HKSCS-2008 to <https://gitorious.org/whatwg/big5>.

There are 29 mappings to U+003F (?) in IE that no other browser has. The  
remaining mappings are to PUA or U+FFFD in all browsers, which appears to  
simply be an artifact of the way the mapping is done internally. Mapping  
these to U+FFFD unless anyone finds pages using these byte sequences seems  
the only sane option.

So, <http://people.opera.com/philipj/2012/04/06/big5-foolip.txt> is the  
mapping I suggest, with 18594 defined mappings and 1188 U+FFFD.

-- 
Philip J?genstedt
Core Developer
Opera Software

Received on Friday, 6 April 2012 14:03:22 UTC