Re: [whatwg/encoding] Remove the last 14 characters PUA of GB18030-2005 (#27) from aphillips on 2016-09-06 (public-webapps-github@w3.org from September 2016)

From: aphillips <notifications@github.com>
Date: Tue, 06 Sep 2016 10:11:05 -0700
To: whatwg/encoding <encoding@noreply.github.com>
Message-ID: <whatwg/encoding/issues/27/245020644@github.com>

The GB18030 mapping is naturally fungible wrt PUA characters, since Unicode continues to encode Chinese code points. I think this should be recognized by Encoding.

I agree that we should not remove mapping of Unicode PUA -> GB18030 (compatibility). But the problem here is round-tripping of real Unicode code points with GB18030.

If I have a U+20087, convert it to GB18030, and the later reserialize the GB data as UTF-8, I will get back U+E816 rather than the original (and correct) code point. That's undesirable and a loss of information. The fact that existing implementations haven't caught up with standardization doesn't mean that we shouldn't make this change.

@annevk Under what circumstances would we change? One of the problems with establishing a standard is that implementations are trying hard to be compliant with it...

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/27#issuecomment-245020644

Received on Tuesday, 6 September 2016 17:33:55 UTC