- From: vyv03354 <notifications@github.com>
- Date: Thu, 21 Nov 2024 15:30:40 -0800
- To: whatwg/encoding <encoding@noreply.github.com>
- Cc: Subscribed <subscribed@noreply.github.com>
- Message-ID: <whatwg/encoding/issues/338/2492561911@github.com>
Although I don't agree with OP's request, I have to correct the factual error. > As near as I can tell, the code unit sequence 0xA3 0xA0 is not actually assigned in GB18030. Yes, the code unit sequence 0xA3 0xA0 is assigned in GB18030. > A look at CJKV Information Processing Why don't you read the spec itself instead of a secondary source? > I do not have a copy of GB18030 🤔 Here is a quote from the GB18030-2022 spec.  From the spec compliance perspective, 0xA3 0xA0 must be mapped to U+E5E5, period. We are intentionally violating the spec for web-compat. > I do not think that this represents a critical problem, since no data should exist in GB18303 that uses this byte sequence for anything meaningful. Even one character mapping change makes GB18030 not a UTF. It breaks round-trip conversion. > I don't think any graphical character will ever be assigned to this specific sequence, so it probably makes no difference. Yes, it makes a difference. U+E5E5 will render a white box (on Chromium) or a hexbox (on Firefox) and it will be a visually glitch if the page author intended to use an IDEPGRAPHIC SPACE (U+3000). This is the very reason we violated the spec. -- Reply to this email directly or view it on GitHub: https://github.com/whatwg/encoding/issues/338#issuecomment-2492561911 You are receiving this because you are subscribed to this thread. Message ID: <whatwg/encoding/issues/338/2492561911@github.com>
Received on Thursday, 21 November 2024 23:30:44 UTC