Re: [whatwg/url] Should we unescape characters in path? (#606) from Karl on 2021-05-21 (public-webapps-github@w3.org from May 2021)

From: Karl <notifications@github.com>
Date: Fri, 21 May 2021 04:15:23 -0700
To: whatwg/url <url@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <whatwg/url/issues/606/845877662@github.com>

It sounds like a good idea to decode them, IMO. The [latest HTTP semantics draft spec](https://httpwg.org/http-core/draft-ietf-httpbis-semantics-latest.html#rfc.section.4.2.3) says:

> Scheme-based normalization (Section 6.2.3 of [RFC3986]) of "http" and "https" URIs involves the following additional rules:
> ...
> Characters other than those in the "reserved" set are equivalent to their percent-encoded octets: the normal form is to not encode them (see Sections 2.1 and 2.2 of [RFC3986]).

We already do the other HTTP-specific normalisations (removing default ports, root path instead of empty, lowercased host name), as well as other normalisations (e.g. exotic IP addresses), so I think it makes sense to do this, too. Some part of the system will have to - best to do it as soon as possible at the URL level to avoid mismatches like those you’ve described.


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/606#issuecomment-845877662

Received on Friday, 21 May 2021 11:15:36 UTC