Re: [whatwg/encoding] EUC-JP encoding is currently ambiguous (#225)

> This is ambiguous because there are several code points where there are several pointers to the same code point, such as 0xFA16 has two. I've observed that Chrome and Firefox always choose the larger of the two.

Can you show clearer steps to reproduce for Firefox choosing the larger index for EUC-JP? The larger index values can't even be encoded in the EUC-JP code space.

If I create an EUC-JP document with a form, enter 猪¬ into a form field and submit the form, I see `%FB%A3%A2%CC` in the query string, as the spec requires. With Shift_JIS, I get `%FB%5E%81%CA`, which is also per spec.

(The definition of Shift_JIS pointer excludes the lower copy of IBM kanji from the search. That is, the logic isn't the highest index but the exclusion of a certain range as the behavior of ¬ shows.)

AFAICT, this definitions of "index pointer" and "index Shift_JIS pointer" are not ambiguous.

The test document I used was `data:text/html;charset=EUC-JP,<form action=https://example.com><input name=v><input type=submit></form>` (It appears that GitHub won't linkify it.)

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/225#issuecomment-684789720

Received on Tuesday, 1 September 2020 11:40:59 UTC