- From: Stephen Checkoway <notifications@github.com>
- Date: Wed, 24 Feb 2021 09:37:03 -0800
- To: whatwg/encoding <encoding@noreply.github.com>
- Cc: Subscribed <subscribed@noreply.github.com>
- Message-ID: <whatwg/encoding/issues/253@github.com>
https://encoding.spec.whatwg.org/commit-snapshots/4d54adce6a871cb03af3a919cbf644a43c22301a/#gb18030-decoder
> If byte is end\-of\-queue, and gb18030 first, gb18030 second, or gb18030 third is not 0x00, set gb18030 first, gb18030 second, and gb18030 third to 0x00, and return error\.
I think this violates the requirements in the [Security Background section](https://encoding.spec.whatwg.org/#security-background)
> Decoders of encodings that use multiple bytes for scalar values now require that in case of an illegal byte combination, a scalar value in the range U+0000 to U+007F, inclusive, cannot be “masked”.
In particular, the input sequence 0x81 0x30 should, by my reading of the sentence quoted above, produce U+FFFD U+0030 but according to the specification, only a single U+FFFD is produced.
The Rust crate `encoding_rs` agrees with the specification.
```rust
use encoding_rs::*;
fn main() {
let (output, replacements) = GB18030.decode_without_bom_handling(&[0x81, 0x30]);
assert!(replacements);
assert_eq!(output, "\u{FFFD}0");
}
```
The second assertion fails because the output contains just the U+FFFD.
Chrome, Firefox, and Safari all agree with `encoding_rs`. E.g., appending the byte sequence 0x81 0x30 to
```html
<!DOCTYPE html>
<html>
<head>
<meta charset=gb18030>
</head>
<body>
<span id=bug>
```
results in the `span` element containing just U+FFFD.
Maybe this masking of an ASCII character at the end of the input is fine and the security background should be updated instead to note that fact.
A similar issue arises with the byte sequence 0x81 0x30 0x81 which I'd expect to be U+FFFD U+0030 U+FFFD but instead decodes to a single replacement U+FFFD.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/253
Received on Wednesday, 24 February 2021 17:37:16 UTC