- From: Anne van Kesteren <notifications@github.com>
- Date: Tue, 10 Jan 2023 04:30:29 -0800
- To: whatwg/url <url@noreply.github.com>
- Cc: Subscribed <subscribed@noreply.github.com>
- Message-ID: <whatwg/url/issues/603/1377190876@github.com>
I think @macchiati is correct. 1. https://www.unicode.org/reports/tr46/#ToASCII is where we start as that is what the URL Standard invokes. 2. https://www.unicode.org/reports/tr46/#Processing is what gets invoked first. 3. Step 4 there is the interesting one. In our case the input starts with `xn--`. 4. So we enter https://www.rfc-editor.org/rfc/rfc3492.html#section-6.2. Pseudo-code, great. 5. The fifth step there reads as follows: > consume all code points before the last delimiter (if there is one) and copy them to output, fail on any non-basic code point 6. Now https://www.rfc-editor.org/rfc/rfc3492.html#section-5 explains what "basic" means here (not the greatest of terms), which suggests that `é` leads to an error here. 7. Now we go back and read https://www.unicode.org/reports/tr46/#ToASCII again and notice: > If an error was recorded in steps 1-4, then the operation has failed and a failure value is returned. No DNS lookup should be done. We should add a WPT for this, but I think this case is adequately covered by the specification and CheckHyphens doesn't impact it one way or another. -- Reply to this email directly or view it on GitHub: https://github.com/whatwg/url/issues/603#issuecomment-1377190876 You are receiving this because you are subscribed to this thread. Message ID: <whatwg/url/issues/603/1377190876@github.com>
Received on Tuesday, 10 January 2023 12:30:41 UTC