- From: Stephen Checkoway <notifications@github.com>
- Date: Wed, 24 Feb 2021 09:37:03 -0800
- To: whatwg/encoding <encoding@noreply.github.com>
- Cc: Subscribed <subscribed@noreply.github.com>
- Message-ID: <whatwg/encoding/issues/253@github.com>
https://encoding.spec.whatwg.org/commit-snapshots/4d54adce6a871cb03af3a919cbf644a43c22301a/#gb18030-decoder > If byte is end\-of\-queue, and gb18030 first, gb18030 second, or gb18030 third is not 0x00, set gb18030 first, gb18030 second, and gb18030 third to 0x00, and return error\. I think this violates the requirements in the [Security Background section](https://encoding.spec.whatwg.org/#security-background) > Decoders of encodings that use multiple bytes for scalar values now require that in case of an illegal byte combination, a scalar value in the range U+0000 to U+007F, inclusive, cannot be “masked”. In particular, the input sequence 0x81 0x30 should, by my reading of the sentence quoted above, produce U+FFFD U+0030 but according to the specification, only a single U+FFFD is produced. The Rust crate `encoding_rs` agrees with the specification. ```rust use encoding_rs::*; fn main() { let (output, replacements) = GB18030.decode_without_bom_handling(&[0x81, 0x30]); assert!(replacements); assert_eq!(output, "\u{FFFD}0"); } ``` The second assertion fails because the output contains just the U+FFFD. Chrome, Firefox, and Safari all agree with `encoding_rs`. E.g., appending the byte sequence 0x81 0x30 to ```html <!DOCTYPE html> <html> <head> <meta charset=gb18030> </head> <body> <span id=bug> ``` results in the `span` element containing just U+FFFD. Maybe this masking of an ASCII character at the end of the input is fine and the security background should be updated instead to note that fact. A similar issue arises with the byte sequence 0x81 0x30 0x81 which I'd expect to be U+FFFD U+0030 U+FFFD but instead decodes to a single replacement U+FFFD. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/whatwg/encoding/issues/253
Received on Wednesday, 24 February 2021 17:37:16 UTC