Re: [whatwg/encoding] Big5 encoding mishandles some trailing bytes, with possible XSS (#171)

> Thanks for that analysis, I guess that is indeed a novel angle that I'm not sure was fully considered.

While novel angle for the Encoding Standard, my understanding is that a similar structural concern kept Shift_JIS [unsupported as a system encoding in Fedora](https://bugzilla.redhat.com/show_bug.cgi?id=136290) so that the transition was direct from EUC-JP to UTF-8. (I don't know how DOS and pre-NT Windows dealt internally with 0x5C in two-byte characters in file paths under the Japanese locale. I also don't how if Fedora supported Big5 as a system encoding previously.)

> In fact, I discovered this exactly from a client side JSON parsing (using this algorithm), with data produced server side (using the Big 5-HKSCS table by default).

Interesting both because JSON is not allowed to be Big5-encoded (irrelevant as far as providing security even for people who don't conform with specs goes) and because the spec is trying to be able to decode HKSCS.

While a backslash may have different implications than other ASCII-range bytes as the second byte of an unmappable sequence, I'm reluctant to make a special case for 0x5C in the the general ASCII unmasking policy with the level of evidence offered so far.

Before debating special-casing 0x5C as the trail byte when an index lookup fails, I'd be interested in learning what Big5-HKSCS generator can generate byte pairs that the index in the Encoding Standard does not have mappings for. We have mappings for the 0x5C trail byte for every lead byte from 0x87 onwards. We have no mappings for _any_ byte pair, whose lead is in the range 0x81 to 0x86, inclusive. What software produced the 0x83, 0x5C byte sequence and what Big5 extension does it belong to? (CC @foolip)

> you have to be sure that it's identical to those defined

That's relatively useless advice especially in the case of Big5, which, as defined in the spec, is a WHATWG synthesis that is unlikely to be an exact match for any legacy implementation.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/171#issuecomment-458491655

Received on Tuesday, 29 January 2019 10:38:26 UTC