- From: Timothy Gu <notifications@github.com>
- Date: Mon, 17 May 2021 12:33:26 -0700
- To: whatwg/url <url@noreply.github.com>
- Cc: Subscribed <subscribed@noreply.github.com>
- Message-ID: <whatwg/url/issues/438/842578214@github.com>
This issue demonstrates a need for URLs such as `xn--x.com` to be preserved as `xn--x.com`, despite the Punycode decoding error. However, to prevent reparse bugs, we need to treat Unicode and validly-encoded ASCII versions of a invalid label the same way. In other words: * Both `xn--a-ecp.ru` and `a⒈.ru` should have the same parsing result (`⒈` is a **disallowed** character according to UTS 46) * Both `xn--a.xn--nxa` and `xn--a.β` should have the same parsing result * It's unclear whether `xn--é` and `xn--xn---epa` should be allowed to have different parsing results (#603) I propose allowing **ASCII** labels with Punycode decoding errors to remain, but still forbid other types of UTS 46 error. So we have the following matrix: Domain | spec | Chrome | Firefox | Safari | proposal -----|--------|-------------|-----------|----------|------ xn--a-ecp.ru | fail | xn--a-ecp.ru | xn--a-ecp.ru | fail | fail | fail a⒈.ru | fail | fail | xn--a-q10i.ru | fail | fail | fail xn--a.xn--nxa | fail | xn--a.xn--nxa | xn--a.xn--nxa | xn--a.xn--nxa | xn--a.xn--nxa xn--a.β | fail | fail | xn--a.xn--nxa | fail | xn--a.xn--nxa xn--é | fail | fail | xn--xn---epa | fail | fail xn--xn---epa | xn--xn---epa | xn--xn---epa | xn--xn---epa | xn--xn---epa | xn--xn---epa There's already precedent (Safari) for treating Punycode decoding error differently from other UTS 46 failures, as one can see by comparing `xn--a-ecp.ru` against `xn--a.xn--nxa`. However, this also means we will need a UTS 46 modification to distinguish Punycode decoding errors from other types of errors. One way to get this is adding a _IgnoreInvalidPunycode_ boolean flag to UTS 46, and in Processing's [`xn--` step](https://unicode.org/reports/tr46/#ProcessingStepPunycode), change it to: > 1. Attempt to convert the rest of the label to Unicode according to _Punycode_ [RFC3492]. If that conversion fails, record that there was an error **if the label contains non-ASCII characters or if _IgnoreInvalidPunycode_ is false**, and continue with the next label. Otherwise replace the original label in the string by the results of the conversion. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/whatwg/url/issues/438#issuecomment-842578214
Received on Monday, 17 May 2021 19:33:47 UTC