[whatwg/url] URL standard ignores section 7.3 of RFC 3986 (#658)

The whatwg URL standard expects URL-decoding of path control characters: https://url.spec.whatwg.org/#double-dot-path-segment

This is a violation of the URI RFC specification, section 7.3: https://datatracker.ietf.org/doc/html/rfc3986#section-7.3

relevant section, emphasis mine.

> Applications **must** split the URI into its
components and subcomponents prior to decoding the octets, as
otherwise the decoded octets might be mistaken for delimiters.

Per RFC 2119 (https://www.ietf.org/rfc/rfc2119.txt), the must keyword defines a requirement for compliance with a specification.

Is this an oversight or is the wg intentionally deprecating this requirement?

This results in interoperability errors with Node (which follows this standard) and other languages that do not, e.g. Python.

Node:
```
> var string = "https://www.example.com/foo/%2e%2e/bar";
undefined
> var url = new URL(string);
undefined
> console.log(url);
URL {
  href: 'https://www.example.com/bar',
  origin: 'https://www.example.com',
  protocol: 'https:',
  username: '',
  password: '',
  host: 'www.example.com',
  hostname: 'www.example.com',
  port: '',
  pathname: '/bar',
  search: '',
  searchParams: URLSearchParams {},
  hash: ''
}
```

Python:
```
 from urllib.parse import urlparse
>>> url = urlparse(string)
>>> print(url)
ParseResult(scheme='https', netloc='www.example.com', path='/foo/%2e%2e/bar', params='', query='', fragment='')exi
```

See also:

From whatwg/url: https://github.com/whatwg/url/issues/565 which raises additional concerns around URL decoding while parsing.
From nodejs/node: https://github.com/nodejs/node/issues/40431 original report

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/658

Received on Wednesday, 13 October 2021 15:34:38 UTC