Re: [whatwg/url] URL standard ignores section 7.3 of RFC 3986 (#658)

Expanding on Anne's comment, we do follow the section of RFC 3986 quoted:
> Applications must split the URI into its components and subcomponents prior to decoding the octets, as otherwise the decoded octets might be mistaken for delimiters.

We first split the URL into scheme/hostname/pathname components, where the pathname is `/foo/%2e%2e/bar`. Only after that do we decode the octets. Indeed, when they say "delimiters" they are mostly talking about `?` or `/`, which can start a separate component of the URL, and which we do not decode.

Python also isn't exactly a fair comparison since they _never_ normalize `..`, even if unescaped:
```
>>> urllib.parse.urlparse('https://www.example.com/foo/../bar')
ParseResult(scheme='https', netloc='www.example.com', path='/foo/../bar', params='', query='', fragment='')
```
The only interesting case I could find is curl: they normalize `..` but not `%2e%2e`.

There are good reasons to treat `%2e%2e` the same as `..` though, and all browsers do it. As an example, the recently discovered Apache [vulnerability](https://blog.cloudflare.com/helping-apache-servers-stay-safe-from-zero-day-path-traversal-attacks/) would have never occurred had their URL parser normalized `%2e%2e`.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/658#issuecomment-944486875

Received on Friday, 15 October 2021 17:52:35 UTC