[whatwg/url] Proposal: Add a normalization interface (Issue #729) from Richard Gibson on 2022-12-16 (public-webapps-github@w3.org from December 2022)

From: Richard Gibson <notifications@github.com>
Date: Fri, 16 Dec 2022 09:31:19 -0800
To: whatwg/url <url@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <whatwg/url/issues/729@github.com>

As noted in #606 and elsewhere, the URL APIs strongly lean towards preserving input, and therefore differentiating URIs that are equivalent per e.g. https://www.rfc-editor.org/rfc/rfc9110#section-4.2.3 . But users need to compare such URIs and/or map them to resources, and doing so robustly requires normalization. I think it therefore makes sense to provide a normalization interface, and probably one that is configurable (or can become so in the future) to account for various levels of the "[comparison ladder](https://www.rfc-editor.org/rfc/rfc3986#section-6.2)" such as generic percent-decoding (and case normalization of percent-encodings that survive), dot-segment removal, component-sensitive percent-decoding, scheme-based rules, and possibly also even higher-order considerations such as full case normalization and/or query parameter ordering/combining/value normalization.

One possibility would be adding a `normalize` method to the [URL class](https://url.spec.whatwg.org/#url-class) with reasonable behavior in the absence of any argument (e.g., as much normalization as possible without conflation of URIs that implementations supporting the scheme are permitted to differentiate), such that e.g. `new URL("http://example.com:80/~smith/home.html").normalize() === new URL("http://EXAMPLE.com/%7Esmith/home.htm").normalize()` is true while `new URL("http://example.com/data").normalize() === new URL("http://example.com/data/").normalize()` is false.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/729
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/url/issues/729@github.com>

Received on Friday, 16 December 2022 17:31:32 UTC