Re: [whatwg/url] Close gap between URL writing and parser sections (PR #905)

domenic left a comment (whatwg/url#905)

Haven't yet looked at the changes in detail---will try to do that soon. But according to my agents, even after these changes we have the following divergences:

From Codex:

> - Predicate false, parser succeeds clean: `https://example.org//`.
> - Predicate true, parser fails: IPv4-looking domains such as `https://example.255/`, `https://1.2.3.4.5/`, `https://256.1.1.1/`, plus `foo://:80`.
> - Predicate true, parser succeeds with validation errors: `foo://h@x` is valid by the new non-special opaque-body grammar, while the parser reports `invalid-credentials`.

From Claude (so wordy...):

> After the change, a broad bidirectional fuzz (~1.1M inputs) leaves exactly three divergence classes, **all pre-existing and untouched by #905**:
> 
> 1. **IPv4 reinterpretation of numeric-looking domains** *(valid string → parser fails or warns).* A host that is a valid *domain string* (passes strict ToASCII) but "ends in a number" is re-parsed as IPv4 by the parser, which then fails or warns. The writing grammar never does IPv4 *parsing* — it only checks "valid IPv4-address string" syntactically and "valid domain string" via ToASCII — so it accepts these.
>    - `https://example.255/` → `IPv4-non-numeric-part` (fail); `https://1.2.3.4.5/` → `IPv4-too-many-parts` (fail); `https://256.1.1.1/` → `IPv4-out-of-range-part` (fail); `https://09/` → fail; `https://0x1.2/` → `IPv4-non-decimal-part` (**parses**, just warns).
> 
> 2. **Empty path segments / empty authority** *(invalid string → parser accepts cleanly, no error).* A `path-absolute-URL string` can't start with `//` (its `path-relative` part may not start with `/`), so the grammar rejects a leading empty segment, but the parser produces it silently.
>    - `https://example.org//`, `https://example.org//p`, and the related empty-authority `file://`.
> 
> 3. **Relative reference against a cannot-be-a-base (opaque-path) base** *(valid string → parser fails with `missing-scheme-non-relative-URL`).* `relative-URL string` validity switches only on the base's *scheme*, so the predicate calls e.g. `a/b`, `/p`, `_dmarc.x` valid against base `foo:opaque` — but the parser can't resolve any non-fragment relative reference against an opaque-path base. This one only appears when a cannot-be-a-base URL is supplied as the base.
> 
> Classes 1 and 3 are the writing grammar being *looser* than the parser (host IPv4 semantics; cannot-be-a-base bases); class 2 is the grammar being *stricter* (empty path segments). None are addressed by #905's scope.

Notably Codex's

> - Predicate true, parser succeeds with validation errors: `foo://h@x` is valid by the new non-special opaque-body grammar, while the parser reports `invalid-credentials`.

seems like it might be a regression? Whereas Claude thinks this PR is a strict progression.

The divergence seems to be how they implemented

> a [URL-scheme string](https://whatpr.org/url/905.html#url-scheme-string) that is not an [ASCII case-insensitive](https://infra.spec.whatwg.org/#ascii-case-insensitive) match for a [special scheme](https://whatpr.org/url/905.html#special-scheme), followed by U+003A (:) and one of: a [scheme-relative-URL string](https://whatpr.org/url/905.html#scheme-relative-url-string), a [path-absolute-URL string](https://whatpr.org/url/905.html#path-absolute-url-string), or zero or more [URL units](https://whatpr.org/url/905.html#url-units)

e.g. `foo://:80` is valid per a literal reading.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/pull/905#issuecomment-4833262285
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/url/pull/905/c4833262285@github.com>

Received on Monday, 29 June 2026 13:44:15 UTC