[whatwg/encoding] gb18030 decoder doesn't clear first, second, third after returning 4-byte encoded code point (#146)

https://encoding.spec.whatwg.org/commit-snapshots/b04091a5f079a7bdcab5aa8c7adead554326a96c/#gb18030-decoder

> If gb18030 third is not 0x00, then:​
> 
> If byte is not in the range 0x30 to 0x39, inclusive, then:​
> 
> Prepend gb18030 second, gb18030 third, and byte to stream\.
> 
> Set gb18030 first, gb18030 second, and gb18030 third to 0x00\.
> 
> Return error\.
> 
> Let code point be the index gb18030 ranges code point for \(\(gb18030 first − 0x81\) × \(10 × 126 × 10\)\) \+ \(\(gb18030 second − 0x30\) × \(10 × 126\)\) \+ \(\(gb18030 third − 0x81\) × 10\) \+ byte − 0x30\.
> 
> If code point is null, return error\.
> 
> Return a code point whose value is code point\.
> 
> 

I'm having trouble understanding how, after the last step above, the decoder will accept the next byte correctly. Because `gb18030 first`/`gb18030 second`/`gb18030 third` is not 0x00 after this last step, it seems to enter the wrong steps for subsequent bytes.

For example, if I have the byte sequence in hex `20 81 40 84 31 83 30`, decoding it will result in ` 丂︔�` (error at the end) but the expected is ` 丂︔`.

I think "set `gb18030 first`, `gb18030 second`, and `gb18030 third` to 0x00" before returning error or code point is missing?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/146

Received on Thursday, 14 June 2018 04:05:42 UTC