Re: [whatwg/encoding] Big5 encoding mishandles some trailing bytes, with possible XSS (#171)

We have telemetry from Firefox 86 that is best explained by the hypothesis that users in Taiwan and Hong Kong encounter unlabeled Big5 containing byte sequences that the Encoding Standard considers unmapped. (I'm working on getting the numbers that I'm looking at OKed for publication.)

Of the byte pairs in the Big5 range that aren't mapped by the Encoding Standard, only the ones with lead byte 0xA3 are unmapped in Internet Explorer. [The rest map to the Private Use Area.](https://hsivonen.com/test/moz/big5-eudc.htm)

In that demo, the middle column labeled "Bytes" can be used to verify the decoding in IE. The rightmost column labeled "PUA NCR" contains a cross-browser numeric character reference for what IE decodes the bytes to. This can be used for probing fonts for glyph assignments in non-IE browsers. (The page is declared as `zh-HK`.)

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/171#issuecomment-808122406

Received on Friday, 26 March 2021 11:02:26 UTC