Re: [whatwg/url] URL path shortening for ../ creates problem with other URL parsers that do not follow the whatwg standard (Issue #810)

By the way, the treatment of `.` and `..` components in this standard is consistent with previous RFCs, such as RFC-3986.

When describing URL normalization, RFC-3986 is explicit about what `.` and `..` components are for:

> [6.2.2.3](https://datatracker.ietf.org/doc/html/rfc3986#section-6.2.2.3).  Path Segment Normalization
>
>   The complete path segments "." and ".." are intended only for use
   within relative references ([Section 4.1](https://datatracker.ietf.org/doc/html/rfc3986#section-4.1)) and are removed as part of
   the reference resolution process ([Section 5.2](https://datatracker.ietf.org/doc/html/rfc3986#section-5.2)).  However, some
   deployed implementations incorrectly assume that reference resolution
   is not necessary when the reference is already a URI and thus fail to
   remove dot-segments when they occur in non-relative paths.  URI
   normalizers should remove dot-segments by applying the
   remove_dot_segments algorithm to the path, as described in
   [Section 5.2.4](https://datatracker.ietf.org/doc/html/rfc3986#section-5.2.4).

It's been a while since I read the HTTP spec, but if I recall correctly, I believe the idea is that the server takes the path and query (together known as the "request target"), and the host (supplied via the `Host:` header), and reconstructs the URL using the processing in RFC-3986. That means it would be expected to remove dot segments, as this standard also does.

Furthermore, I think RFC-3986 is clear that `.` and `..` components are intended only for hierarchical traversal within the path, and any servers which give them a different interpretation could reasonably be described as "incorrect". It would be very difficult for clients to robustly interact with such a server because the RFC gives tools and libraries explicit license and even encouragement to remove them ("URI normalizers should remove dot-segments").

So IMO, that part of the standard, and its embodiment in JavaScript's `URL` class, is consistent with RFC-3986. Which is nearly 20 years old and widely used even outside of browser contexts.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/810#issuecomment-1872340425
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/url/issues/810/1872340425@github.com>

Received on Friday, 29 December 2023 21:14:09 UTC