Re: [whatwg/encoding] Big5 encoding mishandles some trailing bytes, with possible XSS (#171)

> Before debating special-casing 0x5C as the trail byte when an index lookup fails, I'd be interested in learning what Big5-HKSCS generator can generate byte pairs that the index in the Encoding Standard does not have mappings for. We have mappings for the 0x5C trail byte for every lead byte from 0x87 onwards. We have no mappings for any byte pair, whose lead is in the range 0x81 to 0x86, inclusive. What software produced the 0x83, 0x5C byte sequence and what Big5 extension does it belong to? (CC @foolip)

I'm afraid that I don't know much about the software which produced the content that @annevk, I and others analyzed back in the day. I feel like the best way to answer questions about what the spec should say now would be to perform a new scrape looking for certain patterns, perhaps starting from the list of URLs in httparchive.

> I'd rather not tweak legacy encoding implementations any further in Chromium's copy of ICU unless it's absolutely necessary.

@jungshik are we now in alignment with the Encodings spec for Big5? If not, do you know where the differences are?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/171#issuecomment-458942797

Received on Wednesday, 30 January 2019 13:27:44 UTC