Re: [encoding] iso-2022-jp encoder XSS risks (#15)

> I find returning an encoder error (for U+000E and U+000F) with 0xFFFD rather than with 0x000E or 0x000F may complicate matters in implementation (this behavior doesn't occur anywhere else in the current Encoding Standard).

Indeed, adding a case where an encoder algorithm returns error with a code point other than the _code point_ it's currently processing is trouble for APIs that just signal error and leave it to the caller to extract the unmappable character from the input buffer. This results in [sad code in Gecko](https://mxr.mozilla.org/mozilla-central/source/intl/uconv/nsNCRFallbackEncoderWrapper.cpp#86). I think this pattern in an API design mistake in general, and  a better API, [like the one I'm proposing for Gecko](https://github.com/hsivonen/encoding-rs/blob/1dac008e4610d1ff4e2e59fe33aa460c59f0f215/src/lib.rs#L1605), returns the unmappable code point to the caller.

As I see it, the main problem is when you want to implement the spec within the context of an existing widely used but not great API design (e.g. my ongoing implementation effort for Validator.nu tries to use the `java.nio` API). In such a case, it's pretty annoying to deal with a special case for an encoding that's so little used on the Web as ISO-2022-JP, but there is a way to deal: Implementing a better private API and moving the NCR generation to work with the private API instead the public one.

I still think we should return error with U+FFFD if there is a security reason to do so in the browser context.

---
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/15#issuecomment-175495453

Received on Wednesday, 27 January 2016 09:03:05 UTC