[whatwg/url] Encourage denoting character-attributable errors by the REPLACEMENT CHARACTER (Issue #819)

### What is the issue with the URL Standard?

The URL Standard gives advice about URL rendering:
https://url.spec.whatwg.org/#ref-for-concept-domain-to-unicode%E2%91%A0

It also in the https://url.spec.whatwg.org/#concept-host-parser section says: "Alternatively [UTF-8 decode without BOM or fail](https://encoding.spec.whatwg.org/#utf-8-decode-without-bom-or-fail) can be used, coupled with an early return for failure, as [domain to ASCII](https://url.spec.whatwg.org/#concept-domain-to-ascii) fails on U+FFFD (�).", which is the opposite remark of what I'm asking for here.

UTS 46 says: "Implementations may make further modifications to the resulting Unicode string when showing it to the user. For example, it is recommended that disallowed characters be replaced by a U+FFFD to make them visible to the user."

It would be useful for the URL Standard to highlight this technique and to include a Note to encourage letting U+FFFD from UTF-8 decode flow through the processing and to replace erroneous code points during UTS 46 processing and [forbidden domain code point](https://url.spec.whatwg.org/#forbidden-domain-code-point) processing with U+FFFD so that errors that are attributable to specific things in the domain are visualized to the user. Since U+FFFD is itself a disallowed character, this technique preserves the overall failure status of the domain.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/819
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/url/issues/819@github.com>

Received on Friday, 2 February 2024 13:38:31 UTC