- From: Karl <notifications@github.com>
- Date: Tue, 09 Apr 2024 13:26:10 -0700
- To: whatwg/url <url@noreply.github.com>
- Cc: Subscribed <subscribed@noreply.github.com>
- Message-ID: <whatwg/url/issues/815/2045992637@github.com>
From reading those previous discussions: `^` #458 seems to indicate that WebKit used to allow it. If I'm reading the [Gecko bug report](https://bugzilla.mozilla.org/show_bug.cgi?id=1548306) correctly, their implementation of origins included a separator character for internal flags (which just so happened to be `^`). This is a rather strange design and the bug report even notes that it wasn't the first time it was found to be problematic: > For backwards compatibility reasons, when the origin string was first given access to originAttributes, it was designed such that the trailing attributes block is optional. Namely, if no attributes are non-default, the origin will be written as it is in the spec. The separator character to distinguish between the core origin and originAttributes is ^, so an origin might look like `https://twitter.com^userContextId=1`, or like `https://twitter.com` if there were no origin attributes set. > > This leads to the spoofing issue. If it was possible for a site to include a `^` in its bare origin string, that could cause issues with the origin logic, as it could be possible to imitate originAttribute from a non-attributed origin. It used to be that the separator used was `!`, but that was found to be spoofable in [bug 1172080](https://bugzilla.mozilla.org/show_bug.cgi?id=1172080). That bug is also where the `^` character was chosen. In my opinion, this seems like rather weak justification for disallowing this character in all URLs. We could at least allow it in non-special URLs (i.e. opaque hostnames), since they do not have defined origins. --- `|` Okay, for file URLs it's fair enough, because this standard does actually define a meaning for this character in the hostname of a file URL. But it shouldn't apply to non-file URLs. I think we can at least allow it in opaque hostnames, to solve the `ed2k` compatibility issue. In general, it usually doesn't matter if we're overly restrictive for domains/special URLs (which is what browsers tend to care about), because those special characters often won't be registered to any actual domains. But when it comes to opaque hostnames (which browsers have had very spotty support for), it _does_ matter a great deal, because they contain arbitrary content that will be processed in an arbitrary way. The changes which forbade this characters strike me as being overly broad. -- Reply to this email directly or view it on GitHub: https://github.com/whatwg/url/issues/815#issuecomment-2045992637 You are receiving this because you are subscribed to this thread. Message ID: <whatwg/url/issues/815/2045992637@github.com>
Received on Tuesday, 9 April 2024 20:26:14 UTC