[encoding] Unclear text in utf-8 decoder (#19) from r12a on 2015-12-05 (public-webapps-github@w3.org from December 2015)

From: r12a <notifications@github.com>
Date: Sat, 05 Dec 2015 07:33:29 -0800
To: whatwg/encoding <encoding@noreply.github.com>
Message-ID: <whatwg/encoding/issues/19@github.com>

I wrote a small application (http://rishida.io/apps/encodings/) to work with Encoding tests, and ran into some trouble with the utf-8 decoder. I tried to closely follow the algorithms in the spec, as a way of testing them, but when it came to:

"6. Increase utf-8 bytes seen by one and set utf-8 code point to utf-8 code point + (byte − 0x80) << (6 × (utf-8 bytes needed − utf-8 bytes seen)). "

i ended up with 

```u8cp = u8cp + (byte - 0x80) << (6 * (bytesneeded - bytesseen))```

which gives a much too high number.

what's needed is 

```u8cp = u8cp + ((byte - 0x80) << (6 * (bytesneeded - bytesseen)))```

or 

```u8cp +=  (byte - 0x80) << (6 * (bytesneeded - bytesseen))```

the spec text would be clearer if a couple of extra brackets were introduced, ie.:

"set utf-8 code point to utf-8 code point + ((byte − 0x80) << (6 × (utf-8 bytes needed − utf-8 bytes seen))). "

to show that the shift takes place before adding to utf-8 code point.


---
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/19

Received on Saturday, 5 December 2015 15:33:56 UTC