- From: Domenic Denicola <notifications@github.com>
- Date: Mon, 29 Jun 2026 06:44:11 -0700
- To: whatwg/url <url@noreply.github.com>
- Cc: Subscribed <subscribed@noreply.github.com>
- Message-ID: <whatwg/url/pull/905/c4833262285@github.com>
domenic left a comment (whatwg/url#905) Haven't yet looked at the changes in detail---will try to do that soon. But according to my agents, even after these changes we have the following divergences: From Codex: > - Predicate false, parser succeeds clean: `https://example.org//`. > - Predicate true, parser fails: IPv4-looking domains such as `https://example.255/`, `https://1.2.3.4.5/`, `https://256.1.1.1/`, plus `foo://:80`. > - Predicate true, parser succeeds with validation errors: `foo://h@x` is valid by the new non-special opaque-body grammar, while the parser reports `invalid-credentials`. From Claude (so wordy...): > After the change, a broad bidirectional fuzz (~1.1M inputs) leaves exactly three divergence classes, **all pre-existing and untouched by #905**: > > 1. **IPv4 reinterpretation of numeric-looking domains** *(valid string → parser fails or warns).* A host that is a valid *domain string* (passes strict ToASCII) but "ends in a number" is re-parsed as IPv4 by the parser, which then fails or warns. The writing grammar never does IPv4 *parsing* — it only checks "valid IPv4-address string" syntactically and "valid domain string" via ToASCII — so it accepts these. > - `https://example.255/` → `IPv4-non-numeric-part` (fail); `https://1.2.3.4.5/` → `IPv4-too-many-parts` (fail); `https://256.1.1.1/` → `IPv4-out-of-range-part` (fail); `https://09/` → fail; `https://0x1.2/` → `IPv4-non-decimal-part` (**parses**, just warns). > > 2. **Empty path segments / empty authority** *(invalid string → parser accepts cleanly, no error).* A `path-absolute-URL string` can't start with `//` (its `path-relative` part may not start with `/`), so the grammar rejects a leading empty segment, but the parser produces it silently. > - `https://example.org//`, `https://example.org//p`, and the related empty-authority `file://`. > > 3. **Relative reference against a cannot-be-a-base (opaque-path) base** *(valid string → parser fails with `missing-scheme-non-relative-URL`).* `relative-URL string` validity switches only on the base's *scheme*, so the predicate calls e.g. `a/b`, `/p`, `_dmarc.x` valid against base `foo:opaque` — but the parser can't resolve any non-fragment relative reference against an opaque-path base. This one only appears when a cannot-be-a-base URL is supplied as the base. > > Classes 1 and 3 are the writing grammar being *looser* than the parser (host IPv4 semantics; cannot-be-a-base bases); class 2 is the grammar being *stricter* (empty path segments). None are addressed by #905's scope. Notably Codex's > - Predicate true, parser succeeds with validation errors: `foo://h@x` is valid by the new non-special opaque-body grammar, while the parser reports `invalid-credentials`. seems like it might be a regression? Whereas Claude thinks this PR is a strict progression. The divergence seems to be how they implemented > a [URL-scheme string](https://whatpr.org/url/905.html#url-scheme-string) that is not an [ASCII case-insensitive](https://infra.spec.whatwg.org/#ascii-case-insensitive) match for a [special scheme](https://whatpr.org/url/905.html#special-scheme), followed by U+003A (:) and one of: a [scheme-relative-URL string](https://whatpr.org/url/905.html#scheme-relative-url-string), a [path-absolute-URL string](https://whatpr.org/url/905.html#path-absolute-url-string), or zero or more [URL units](https://whatpr.org/url/905.html#url-units) e.g. `foo://:80` is valid per a literal reading. -- Reply to this email directly or view it on GitHub: https://github.com/whatwg/url/pull/905#issuecomment-4833262285 You are receiving this because you are subscribed to this thread. Message ID: <whatwg/url/pull/905/c4833262285@github.com>
Received on Monday, 29 June 2026 13:44:15 UTC