Re: [whatwg/url] Double-encoded IDNA labels don't roundtrip (#603) from Anne van Kesteren on 2023-01-10 (public-webapps-github@w3.org from January 2023)

From: Anne van Kesteren <notifications@github.com>
Date: Tue, 10 Jan 2023 04:30:29 -0800
To: whatwg/url <url@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <whatwg/url/issues/603/1377190876@github.com>

I think @macchiati is correct.

1. https://www.unicode.org/reports/tr46/#ToASCII is where we start as that is what the URL Standard invokes.
2. https://www.unicode.org/reports/tr46/#Processing is what gets invoked first.
3. Step 4 there is the interesting one. In our case the input starts with `xn--`.
4. So we enter https://www.rfc-editor.org/rfc/rfc3492.html#section-6.2. Pseudo-code, great.
5. The fifth step there reads as follows:
   > consume all code points before the last delimiter (if there is one)
     and copy them to output, fail on any non-basic code point
6. Now https://www.rfc-editor.org/rfc/rfc3492.html#section-5 explains what "basic" means here (not the greatest of terms), which suggests that `é` leads to an error here.
7. Now we go back and read https://www.unicode.org/reports/tr46/#ToASCII again and notice:
   > If an error was recorded in steps 1-4, then the operation has failed and a failure value is returned. No DNS lookup should be done.

We should add a WPT for this, but I think this case is adequately covered by the specification and CheckHyphens doesn't impact it one way or another.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/603#issuecomment-1377190876

You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/url/issues/603/1377190876@github.com>

Received on Tuesday, 10 January 2023 12:30:41 UTC