[whatwg/url] Confusion about pipe characters in paths (Issue #852)

### What is the issue with the URL Standard?

I'm debugging some software incompatibility specifically around pipe characters (`|`, U+007c) in URL paths. Neither RFC 2396 nor RFC 3986 permit pipes in URIs according to the ABNF. The whatwg url spec also does not seem to permit it (pipes are not part of `URL units`). However, the `path percent-encode set` does *not* include the pipe character.

A quick survey of what browsers do shows that:

- `encodeURI` and `encodeURIComponent` do encode the pipe character, which makes sense because they use different percent-encode sets, not the path one.
- In firefox, a link `<a href="foo|barä">` sends `GET /foo|bar%C3%A4`, so it *does* use `path percent-encode set`. The httpwg HTTP spec seems to forbid this since it references RFC 3986, but firefox sends it anyway.
- In chromium, the same link sends `GET /foo%7Cbar%C3%A4`, so it encodes both (but only displays the decoded `ä` in the url bar, interestingly)

The whatwg url spec itself does not seem to be inconsistent. In the spec itself, `path percent-encode set` is only used in the URL *decoding* logic, nowhere does it actually say that path components should be encoded with this set, even if that is implied.

So, what is actually the right behavior here? Should pipes in path segments be percent-encoded or not, and if so, why doesn't firefox do it? And should the `path percent-encode set` be adjusted to include `|`?

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/852
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/url/issues/852@github.com>

Received on Thursday, 23 January 2025 07:45:35 UTC