- From: Ludovic Delabre <notifications@github.com>
- Date: Sat, 07 Aug 2021 13:34:48 -0700
- To: whatwg/encoding <encoding@noreply.github.com>
- Cc: Subscribed <subscribed@noreply.github.com>
Received on Saturday, 7 August 2021 20:35:00 UTC
Hi, I need to implement a full blown HTML5 parsing library in C# (with all its quirks); the fitst layer being byte-stream decoding. After implementing Shift_JIS as described in https://encoding.spec.whatwg.org/#shift_jis-decoder, I did a full conformance check against https://www.w3.org/International/tests/repo/encoding/legacy-mb-japanese/shift_jis/sjis_chars.html. First, byte 0x5c (which is ASCII) must be changed to U+00a5; same as 0x7e to U+203E which seems to be missing from the spec. Both characters are marked as "Modified ASCII character" at https://en.wikipedia.org/wiki/Shift_JIS. But my main issue is with the bytes sequence 0x81 0x7C which according to https://encoding.spec.whatwg.org/index-jis0208.txt can be both decoded at either u+2211 or u+FF0C. Did I misinterprete something ? Thanks for your help, Ludovic. Ps : I notice the same trouble with EUC-JP with the sequence 0xA1 0xDD which can decoder either as u+2211 u+ff0c (different sequences but same code points ?) -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/whatwg/encoding/issues/270
Received on Saturday, 7 August 2021 20:35:00 UTC