Re: [whatwg/url] IDNA: avoid defining valid domain string in terms of the parser (#245)

pspacek left a comment (whatwg/url#245)

Hi,

a DNS guy here. Allow me to describe this from DNS perspective:

> Maybe I'm wrong but aren't valid domains defined in the RFCs below?
> 
>     * https://www.ietf.org/rfc/rfc1034.txt
> 
>     * https://www.ietf.org/rfc/rfc1123.txt
> 
> 
> The first one saying:
> 
> <domain> ::= <subdomain> | " "

Indeed that is wrong in a subtle way. This quote comes from section
[3.5. Preferred name syntax](https://datatracker.ietf.org/doc/html/rfc1034#section-3.5) of RFC 1034 -with emphasis on **preferred**.

The real limits of the DNS protocol are made clear here:
[11. Name syntax](https://datatracker.ietf.org/doc/html/rfc2181#section-11) in RFC 2181. TL;DR anything goes, including binary 0 (ASCII `NUL`) and `.`. These weird-but-permissible-in-DNS names are then encoded into ASCII strings like `\000\..example.com.` where the leftmost label is consists of two ASCII characters:
- `NUL`
- `.` - which is a character **inside** the leftmost label, not a label separator

We could argue URL should be concerned only with **host names** (as opposed to **domains**) and then the quote might more fitting, but that ignores IDNA completely. RFC5890 defines stricter subset of permissible names in ASCII encoding...

I'm happy to discuss further if there's interest!

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/245#issuecomment-2662868049
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/url/issues/245/2662868049@github.com>

Received on Monday, 17 February 2025 11:41:53 UTC