[encoding] GB 18030 2000 vs 2005 (#22) from jungshik on 2015-12-10 (public-webapps-github@w3.org from December 2015)

From: jungshik <notifications@github.com>
Date: Thu, 10 Dec 2015 13:33:28 -0800
To: whatwg/encoding <encoding@noreply.github.com>
Message-ID: <whatwg/encoding/issues/22@github.com>

This is the continuation of https://www.w3.org/Bugs/Public/show_bug.cgi?id=28740#c11

I forgot to reply @annevk's question there:

```
Jungshik, do you mean you want to make the swap mentioned at the end of comment 5?

> GB 18030   -2005  -2000
> 0xA8BC     U+1E3F U+E7C7
> 0x8135F437 U+E7C7 U+1E3F
```

My answer would be yes. Chrome, Safari and Opera do that. Firefox and IE do not. 

My goal is to minimize the number of PUA code points after decoding partly because there'll be NO font support for those PUA code points on platforms like Android, iOS (and even on Windows 10 when additional fonts are installed for legacy compatibility. That is, old fonts like Simsun support them, but newer fonts like Microsoft Yahei do not). 

https://www.w3.org/Bugs/Public/show_bug.cgi?id=28740#c1 lists them and I thought that there are a bunch of PUA code point mappings that are dropped in GB 18030:2005 in favor of the regular Unicode code points. 

According to Masatoshi Kimura , it's only U+1E3F for 0xA8BC that moved out of PUA area in GB 18030:2005, which is a big disappointment. (I wish GB18030 had taken a similar step to what's taken by HKSCS when it comes to PUA). 

Anyway, at least one code point (0xA8BC <=> U+1E3F) should be mapped to a regular Unicode code point per GB18030:2005 instead of 2000. 



---
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/22

Received on Thursday, 10 December 2015 21:34:01 UTC