- From: Boris Zbarsky <notifications@github.com>
- Date: Wed, 05 Apr 2017 14:04:48 -0700
- To: whatwg/url <url@noreply.github.com>
- Cc: Subscribed <subscribed@noreply.github.com>
- Message-ID: <whatwg/url/issues/291@github.com>
The standard way, going back to at least the mid-90s, to mark up URLs in text is `<url>`. This, of course, relies on unescaped `>` not being allowed in URLs. This is clearly stated, with exactly this rationale, in RFC 1738 section 2.2. The URL standard should have similar provisions.
I don't know what that should mean for URL _parsing_, but in terms of serialization '>' should always be escaped in URLs, imo.
I just tested browser behavior, and:
* Firefox consistently escapes '>' in path, userinfo, query, fragment. '>' in host or port cause parsing failure.
* Safari escapes '>' in path, userinfo, query. It allows '>' unchanged in host and fragment. '>' in port causes parsing failure.
* Chrome escapes '>' in path, userinfo, query, host. It allows '>' unchanged in fragment. '>' in port causes parsing failure.
* Edge escapes '>' in path and host. It allows '>' unchanged in fragment and query. '>' in port causes parsing failure. Presence of userinfo causes parsing failure no matter what.
Testcase used:
<pre><script>
var strs = [
"http://test>test/foo\\bar",
"http://a>b@test/foo\\bar",
"http://test/foo\\bar/#a>b",
"http://test/foo\\bar/?a=c>d",
"http://test:2>3/foo\\bar",
"http://test/foo>bar\\baz",
];
for (var str of strs) {
var a = document.createElement("a");
a.setAttribute("href", str);
var href;
try {
href = a.href;
} catch(e) {
href = "href getter threw";
}
var url;
try {
url = (new URL(str).href);
} catch(e) {
url = "constructor threw";
}
document.writeln(str, " -- ", href, " -- ", url);
}
</script>
with the `\\` bits in there a way to tell whether parsing failed in the href case.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/291
Received on Wednesday, 5 April 2017 21:05:22 UTC