Re: [whatwg/encoding] Why Big5 index contains unmappable characters? (Issue #293)

The Big5 encoder and decoder are asymmetric (like the EUC-JP encoder and decoder). The visualizations visualize what can be decoded. The spec excludes part of the decoding space from round-tripping via the encoder in order for HTML form submission not to generate extension-range bytes that some server-side recipients may not support.

For EUC-JP, the asymmetry is based on historical experience. For Big5, it is by prudent analogy of the problem initially seen with EUC-JP. Also, for Big5, the exclusion for Big5 is questionable and possibly by accident excluding less than what was intended: The encoder only excludes the extension part below the original Big5 range but doesn't exclude the other exclusion part above the original Big5 range.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/293#issuecomment-1223700658
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/encoding/issues/293/1223700658@github.com>

Received on Tuesday, 23 August 2022 08:00:30 UTC