Re: [whatwg/encoding] 0xA3 0xA0 in GB 18030 (Issue #338)

Although I don't agree with OP's request, I have to correct the factual error.
> As near as I can tell, the code unit sequence 0xA3 0xA0 is not actually assigned in GB18030.

Yes, the code unit sequence 0xA3 0xA0 is assigned in GB18030.
> A look at CJKV Information Processing

Why don't you read the spec itself instead of a secondary source?

> I do not have a copy of GB18030

🤔

Here is a quote from the GB18030-2022 spec.
![image](https://github.com/user-attachments/assets/6ebab401-6092-4e54-9784-b3c7b3d5736e)

From the spec compliance perspective, 0xA3 0xA0 must be mapped to U+E5E5, period. We are intentionally violating the spec for web-compat.

> I do not think that this represents a critical problem, since no data should exist in GB18303 that uses this byte sequence for anything meaningful.

Even one character mapping change makes GB18030 not a UTF. It breaks round-trip conversion.

> I don't think any graphical character will ever be assigned to this specific sequence, so it probably makes no difference.

Yes, it makes a difference. U+E5E5 will render a white box (on Chromium) or a hexbox (on Firefox) and it will be a visually glitch if the page author intended to use an IDEPGRAPHIC SPACE (U+3000). This is the very reason we violated the spec.


-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/338#issuecomment-2492561911
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/encoding/issues/338/2492561911@github.com>

Received on Thursday, 21 November 2024 23:30:44 UTC