- From: Henri Sivonen <notifications@github.com>
- Date: Fri, 02 Feb 2024 05:38:23 -0800
- To: whatwg/url <url@noreply.github.com>
- Cc: Subscribed <subscribed@noreply.github.com>
- Message-ID: <whatwg/url/issues/819@github.com>
### What is the issue with the URL Standard? The URL Standard gives advice about URL rendering: https://url.spec.whatwg.org/#ref-for-concept-domain-to-unicode%E2%91%A0 It also in the https://url.spec.whatwg.org/#concept-host-parser section says: "Alternatively [UTF-8 decode without BOM or fail](https://encoding.spec.whatwg.org/#utf-8-decode-without-bom-or-fail) can be used, coupled with an early return for failure, as [domain to ASCII](https://url.spec.whatwg.org/#concept-domain-to-ascii) fails on U+FFFD (�).", which is the opposite remark of what I'm asking for here. UTS 46 says: "Implementations may make further modifications to the resulting Unicode string when showing it to the user. For example, it is recommended that disallowed characters be replaced by a U+FFFD to make them visible to the user." It would be useful for the URL Standard to highlight this technique and to include a Note to encourage letting U+FFFD from UTF-8 decode flow through the processing and to replace erroneous code points during UTS 46 processing and [forbidden domain code point](https://url.spec.whatwg.org/#forbidden-domain-code-point) processing with U+FFFD so that errors that are attributable to specific things in the domain are visualized to the user. Since U+FFFD is itself a disallowed character, this technique preserves the overall failure status of the domain. -- Reply to this email directly or view it on GitHub: https://github.com/whatwg/url/issues/819 You are receiving this because you are subscribed to this thread. Message ID: <whatwg/url/issues/819@github.com>
Received on Friday, 2 February 2024 13:38:31 UTC