[encoding] Unclear text in utf-8 decoder (#19)

I wrote a small application (http://rishida.io/apps/encodings/) to work with Encoding tests, and ran into some trouble with the utf-8 decoder. I tried to closely follow the algorithms in the spec, as a way of testing them, but when it came to:

"6. Increase utf-8 bytes seen by one and set utf-8 code point to utf-8 code point + (byte − 0x80) << (6 × (utf-8 bytes needed − utf-8 bytes seen)). "

i ended up with 

```u8cp = u8cp + (byte - 0x80) << (6 * (bytesneeded - bytesseen))```

which gives a much too high number.

what's needed is 

```u8cp = u8cp + ((byte - 0x80) << (6 * (bytesneeded - bytesseen)))```

or 

```u8cp +=  (byte - 0x80) << (6 * (bytesneeded - bytesseen))```

the spec text would be clearer if a couple of extra brackets were introduced, ie.:

"set utf-8 code point to utf-8 code point + ((byte − 0x80) << (6 × (utf-8 bytes needed − utf-8 bytes seen))). "

to show that the shift takes place before adding to utf-8 code point.


---
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/19

Received on Saturday, 5 December 2015 15:33:56 UTC