[whatwg/url] Punycode behavior for labels exceeding DNS length is ill-defined (Issue #824)

### What is the issue with the URL Standard?

The URL Standard, UTS 46, and RFC 3492 don’t specify interoperable behavior for Punycode encode and decode failures when a label is longer than what actually makes sense for DNS purposes.

If the input is too long, at some point an integer internal to the Punycode algorithm overflows. See https://datatracker.ietf.org/doc/html/rfc3492.html#section-6.4

One way to specify this would be to specify that the internal integer size be 32 bits, but that can lead to denial of service attacks with unreasonably long inputs. (Apparently Chrome‘s fuzzers managed to time out when fuzzing Punycode.) For this reason, ICU4C has somewhat arbitrary length limits for the inputs to Punycode decode and encode. https://unicode-org.atlassian.net/browse/ICU-13727 https://searchfox.org/mozilla-central/rev/6bc0f370cc459bf79e1330ef74b21009a9848c91/intl/icu/source/common/punycode.cpp#173-176

The rationale from the issue is:

> A well-formed label is limited to 63 bytes, which means at most 59 bytes after "xn--". However, we don't have any limit so far, and people sometimes use libraries and protocols with non-standard inputs.
>
> Something 1000-ish seems like a reasonable compromise to keep n^2 tame and users happy even with somewhat unusual inputs.

The non-arbitrary tight bound would be to fail before decoding Punycode if the decoder input (not counting the `xn--` prefix) would exceed 59 (ASCII) characters and to fail during encoding if the encoder is (not counting the `xn--` prefix) about to output a 60th (ASCII) character.

Using the tight bound would come pretty close to setting `VerifyDNSLength` to true (close, but not exactly: It would still not place a limit for ASCII-only labels and the domain name as a whole). Yet, the URL Standard sets `VerifyDNSLength` to `false`. This comes from https://github.com/whatwg/url/commit/3bec3b89c4deb10842ba6c464c700df47c268f17 , which does not state motivation.

Without knowing the motivation for setting `VerifyDNSLength` to `false`, it’s hard to assess if placing the tight bounds on Punycode would work.

I think the specs should make the behavior here well defined even if it’s not a particularly pressing issue, since it only concern labels that are too long for DNS anyway. (This probably belongs in UTS 46, but filing this here for discussion before sending UTS 46 feedback.)

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/824
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/url/issues/824@github.com>

Received on Friday, 1 March 2024 11:43:19 UTC