- From: Alwin Blok <notifications@github.com>
- Date: Tue, 03 Dec 2024 13:08:22 -0800
- To: whatwg/url <url@noreply.github.com>
- Cc: Subscribed <subscribed@noreply.github.com>
- Message-ID: <whatwg/url/issues/675/2515553465@github.com>
I would want to leave a note. It seems that this decision (whether or not to encode the `\`) is now based plainly on the new behaviour as seen in chrome canary, and the behaviour of the other browsers at present. I think that it is a bit short sighted, or let’s say; not taking into account and/or weighing other aspects; not weighing the trade offs being made. The `\` is a “problem code point”. There’s special cases for it scattered throughout the parser, and the meaning of `\` in the current standard is context dependent. Without scheme information, its meaning in paths is ambiguous. And changing the scheme of an URL that has `\` in its pathname can cause the path to radically change shape and (due to normalising dotted segments) content. It is a very unpleasant property that parsing e.g. `http+git://example.com/path\to\src\..` and then changing the scheme to `http` using the API results in something completely different than directly parsing `http://example.com/path\to\src\..` does. It’s a congruence bug, which is closely related to what you have been calling a reparse bug here. Of course, for changing the scheme of e.g. an http URL to one that has traditionally been used with opaque paths such as `mailto:` and `javascript:` this is necessary in one form or another (if you really want to allow that..). But when changing the scheme across special and non-special-but-hierarchical URLs, it is unexpected and weird. It makes it so that the semantics of URLs is non-compositional. You can no longer replace equals for equals, you can no longer use algebraic reasoning. I am not sure that can be fully recovered, but preventing the parser/ normaliser from producing URLs with such a problem code point as much as possible seems very important to me. Single and double quotes and angle brackets too are required to be percent encoded. IIRC this has not always been consistent across browsers either. They are used as delimiters in html and xml and json, JavaScript, other context that URLs may appear in, so it can cause issues to not escape them and it was changed without problems. It doesn’t make sense to me to NOT require the same for `\` which is way more problematic than the quotation marks since it really messes with the internal structure of an URL unlike the others! -- Reply to this email directly or view it on GitHub: https://github.com/whatwg/url/issues/675#issuecomment-2515553465 You are receiving this because you are subscribed to this thread. Message ID: <whatwg/url/issues/675/2515553465@github.com>
Received on Tuesday, 3 December 2024 21:08:25 UTC