Re: [whatwg/url] can't parse urls starting with xn-- (#438)

I'd like to point out that the current rev of the IDNA RFC [[IDNA2008](https://tools.ietf.org/html/rfc5891)] encourages applications that do DNS lookup to be liberal in what they accept, and in particular to "rely on the assumption that names that are present in the DNS are valid" except for specific cases which are known to cause "serious problems".  In particular, note the text at the end of section 5.4:

> For all other strings, the lookup application MUST rely on the
   presence or absence of labels in the DNS to determine the validity of
   those labels and the validity of the characters they contain.  If
   they are registered, they are presumed to be valid; if they are not,
   their possible validity is not relevant.

where "all other strings" means "all strings that have passed the sequence of checks for 'serious problems' described in sections 5.3 and 5.4".

Here are some examples of URLs that I have personally observed in the wild (during my research, which involves Web crawling) to contain hostnames which are formally invalid per some RFC or other, but do not rise to the level of a 'serious problem', and which I think should probably be accepted by the URL standard, if only for interop's sake:

```
http://r2---sn-gvbxgn-tt1s.googlevideo.com/
http://r9---sn-i3b7sn7d.googlevideo.com/
http://lgbt_grani.livejournal.com/
http://www.mi-ru_mo.bbs.fc2.com/
http://-friction-.tumblr.com/
```

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/438#issuecomment-540681390

Received on Thursday, 10 October 2019 17:06:21 UTC