- From: r12a <notifications@github.com>
- Date: Sat, 05 Dec 2015 07:33:29 -0800
- To: whatwg/encoding <encoding@noreply.github.com>
Received on Saturday, 5 December 2015 15:33:56 UTC
I wrote a small application (http://rishida.io/apps/encodings/) to work with Encoding tests, and ran into some trouble with the utf-8 decoder. I tried to closely follow the algorithms in the spec, as a way of testing them, but when it came to: "6. Increase utf-8 bytes seen by one and set utf-8 code point to utf-8 code point + (byte − 0x80) << (6 × (utf-8 bytes needed − utf-8 bytes seen)). " i ended up with ```u8cp = u8cp + (byte - 0x80) << (6 * (bytesneeded - bytesseen))``` which gives a much too high number. what's needed is ```u8cp = u8cp + ((byte - 0x80) << (6 * (bytesneeded - bytesseen)))``` or ```u8cp += (byte - 0x80) << (6 * (bytesneeded - bytesseen))``` the spec text would be clearer if a couple of extra brackets were introduced, ie.: "set utf-8 code point to utf-8 code point + ((byte − 0x80) << (6 × (utf-8 bytes needed − utf-8 bytes seen))). " to show that the shift takes place before adding to utf-8 code point. --- Reply to this email directly or view it on GitHub: https://github.com/whatwg/encoding/issues/19
Received on Saturday, 5 December 2015 15:33:56 UTC