[whatwg/url] How should parser handle percent-encoded characters like `%66` U+0066 (f) in path segments? (#565)

Parsing of **path state** in section 4.4 ends with:

https://url.spec.whatwg.org/commit-snapshots/2ce49383db3506a3c1a527a775693af1100198ef/#url-parsing

> 2. Otherwise, run these steps:​
>     1. If c is not a URL code point and not U\+0025 \(%\), validation error\.
>     2. If c is U\+0025 \(%\) and remaining does not start with two ASCII hex digits, validation error\.
>     3. UTF\-8 percent\-encode c using the path percent\-encode set and append the result to buffer\. 

It is unclear how parser should handle here a character which is already percent-encoded but **is not** in path percent-encode set.

For example when parsing `http://example.com/%66%6f%6f`, should the result of parsing be `http://example.com/%66%6f%6f` or 
`http://example.com/foo` ?

If the result should be `http://example.com/%66%6f%6f`, i.e. parser should not do percent-decoding, then that means that those two URLs are **not equal** because their serialized forms are different.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/565

Received on Saturday, 26 December 2020 13:32:07 UTC