Re: [whatwg/encoding] Shift_JIS decoder (#270) from Ludovic Delabre on 2021-08-09 (public-webapps-github@w3.org from August 2021)

From: Ludovic Delabre <notifications@github.com>
Date: Mon, 09 Aug 2021 14:41:30 -0700
To: whatwg/encoding <encoding@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <whatwg/encoding/issues/270/895569736@github.com>

> For consistency with the behavior of Windows code page 932, ASCII stays as ASCII. Windows' bundled fonts have corresponding intentional glyph misassigments in the Japanese fonts. Other fonts may not have these.

I find it strange that the norm differs from the W3C I18n test data (as well as the behaviour of Firefox).
But thanks for the precision.

> For the Shift_JIS sequence, 0x81 0x7C and the EUC-JP sequence 0xA1 0xDD, the spec rather clearly says U+FF0D. How did you arrive at either U+2211 or U+FF0C?

My mistake for U+FF0C :-/
From https://www.w3.org/International/tests/repo/encoding/legacy-mb-japanese/shift_jis/sjis_chars.html, there are still two cases  for the same byte sequence :
<span data-cp="2212" data-bytes="81 7C">－</span>
<span data-cp="FF0D" data-bytes="81 7C">－</span>

Since the character between the <span> is actually in both cases U+FF0D, i believe it to be a falsy test.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/270#issuecomment-895569736

Received on Monday, 9 August 2021 21:41:42 UTC