[Bug 28156] Separate GBK and GB18030 even for decoding (toUnicode)

https://www.w3.org/Bugs/Public/show_bug.cgi?id=28156

--- Comment #3 from Jungshik Shin <jshin@chromium.org> ---
(In reply to Anne from comment #1)
> I would have expected that treating them identically for decoding saves you
> a decoding table. Or would you reuse that anyway?

It does not save us anything.  Both tables (GBK and GB18030) would have to be
shipped. (unlike Mozilla, ICU does not have two separate tables for encoding
and decoding). 

Actually, we need an additional code in Blink [1] to treat encoding and
decoding differently for GBK and GB18030 (for toUnicode - identical. for
fromUnicode - distinct), which we'd like to avoid if possible. 


> They're treated identically because gbk is effectively a subset and for the
> other encodings we've found that supersets leak. I think there might be some
> anecdotal evidence here too, but not sure.

As I wrote in the previous comment, I suspect that the extent of "leak" (if
any) is much smaller in gbk-gb18030 than other cases. 

[1] It might be possible to do this in ICU as well, but I don't want to make a
patch to ICU (that is hard to upstream because I don't have a good
justification).

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Received on Tuesday, 12 May 2015 18:59:42 UTC