Re: [whatwg/url] can't parse urls starting with xn-- (#438)

This issue demonstrates a need for URLs such as `xn--x.com` to be preserved as `xn--x.com`, despite the Punycode decoding error. However, to prevent reparse bugs, we need to treat Unicode and validly-encoded ASCII versions of a invalid label the same way. In other words:

* Both `xn--a-ecp.ru` and `a⒈.ru` should have the same parsing result (`⒈` is a **disallowed** character according to UTS 46)
* Both `xn--a.xn--nxa` and `xn--a.β` should have the same parsing result
* It's unclear whether `xn--é` and `xn--xn---epa` should be allowed to have different parsing results (#603)

I propose allowing **ASCII** labels with Punycode decoding errors to remain, but still forbid other types of UTS 46 error. So we have the following matrix:

Domain | spec | Chrome | Firefox | Safari | proposal
-----|--------|-------------|-----------|----------|------
xn--a-ecp.ru | fail | xn--a-ecp.ru | xn--a-ecp.ru | fail | fail | fail
a⒈.ru | fail | fail | xn--a-q10i.ru | fail | fail | fail
xn--a.xn--nxa | fail | xn--a.xn--nxa | xn--a.xn--nxa | xn--a.xn--nxa | xn--a.xn--nxa
xn--a.β | fail | fail | xn--a.xn--nxa | fail | xn--a.xn--nxa
xn--é | fail | fail | xn--xn---epa | fail | fail
xn--xn---epa | xn--xn---epa | xn--xn---epa | xn--xn---epa | xn--xn---epa | xn--xn---epa

There's already precedent (Safari) for treating Punycode decoding error differently from other UTS 46 failures, as one can see by comparing `xn--a-ecp.ru` against `xn--a.xn--nxa`. However, this also means we will need a UTS 46 modification to distinguish Punycode decoding errors from other types of errors.

One way to get this is adding a _IgnoreInvalidPunycode_ boolean flag to UTS 46, and in Processing's [`xn--` step](https://unicode.org/reports/tr46/#ProcessingStepPunycode), change it to:

> 1. Attempt to convert the rest of the label to Unicode according to _Punycode_ [RFC3492]. If that conversion fails, record that there was an error **if the label contains non-ASCII characters or if _IgnoreInvalidPunycode_ is false**, and continue with the next label. Otherwise replace the original label in the string by the results of the conversion.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/438#issuecomment-842578214

Received on Monday, 17 May 2021 19:33:47 UTC