Re: [whatwg/url] How should parser handle percent-encoded characters like `%66` U+0066 (f) in path segments? (#565)

> With the exception of Chromium, it seems like the current spec describes the consensus behavior.

Still this is huge change in behavior from [RFC 3986](https://tools.ietf.org/html/rfc3986):
> 2.3.  Unreserved Characters
> ... URIs that differ in the replacement of an unreserved character with its corresponding percent-encoded US-ASCII octet are equivalent: they identify the same resource.
>
> 6.2.2.1.  Case Normalization
> For all URIs, the hexadecimal digits within a percent-encoding triplet (e.g., "%3a" versus "%3A") are case-insensitive and therefore should be normalized to use uppercase letters for the digits A-F.
>
> 6.2.2.2.  Percent-Encoding Normalization
> The percent-encoding mechanism (Section 2.1) is a frequent source of variance among otherwise identical URIs.  In addition to the case normalization issue noted above, some URI producers percent-encode octets that do not require percent-encoding, resulting in URIs that are equivalent to their non-encoded counterparts.  These URIs should be normalized by decoding any percent-encoded octet that corresponds to an unreserved character, as described in Section 2.3.

According to RFC 3986 following URLs would all be equivalent, but according to current spec they are all non-equivalent:
- `https://example.com/%66%6f%6f`
- `https://example.com/%66%6F%6F`
- `https://example.com/foo`

So if the intention really is for current spec to differ so drastically from RFC 3986, I think this difference should be clearly documented.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/565#issuecomment-753295046

Received on Friday, 1 January 2021 09:54:19 UTC