[whatwg/url] How should parser handle percent-encoded characters like `%66` U+0066 (f) in path segments? (#565) from Markus Laire on 2020-12-26 (public-webapps-github@w3.org from December 2020)

From: Markus Laire <notifications@github.com>
Date: Sat, 26 Dec 2020 05:31:55 -0800
To: whatwg/url <url@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <whatwg/url/issues/565@github.com>

Parsing of **path state** in section 4.4 ends with:

https://url.spec.whatwg.org/commit-snapshots/2ce49383db3506a3c1a527a775693af1100198ef/#url-parsing

> 2. Otherwise, run these steps:
>     1. If c is not a URL code point and not U\+0025 \(%\), validation error\.
>     2. If c is U\+0025 \(%\) and remaining does not start with two ASCII hex digits, validation error\.
>     3. UTF\-8 percent\-encode c using the path percent\-encode set and append the result to buffer\. 

It is unclear how parser should handle here a character which is already percent-encoded but **is not** in path percent-encode set.

For example when parsing `http://example.com/%66%6f%6f`, should the result of parsing be `http://example.com/%66%6f%6f` or 
`http://example.com/foo` ?

If the result should be `http://example.com/%66%6f%6f`, i.e. parser should not do percent-decoding, then that means that those two URLs are **not equal** because their serialized forms are different.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/565

Received on Saturday, 26 December 2020 13:32:07 UTC