[whatwg/url] Unescaped '>' should probably not be allowed in URLs (#291) from Boris Zbarsky on 2017-04-05 (public-webapps-github@w3.org from April 2017)

From: Boris Zbarsky <notifications@github.com>
Date: Wed, 05 Apr 2017 14:04:48 -0700
To: whatwg/url <url@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <whatwg/url/issues/291@github.com>

The standard way, going back to at least the mid-90s, to mark up URLs in text is `<url>`.  This, of course, relies on unescaped `>` not being allowed in URLs.  This is clearly stated, with exactly this rationale, in RFC 1738 section 2.2.   The URL standard should have similar provisions.

I don't know what that should mean for URL _parsing_, but in terms of serialization '>' should always be escaped in URLs, imo.

I just tested browser behavior, and:

* Firefox consistently escapes '>' in path, userinfo, query, fragment.  '>' in host or port cause parsing failure.
* Safari escapes '>' in path, userinfo, query.  It allows '>' unchanged in host and fragment.  '>' in port causes parsing failure.
* Chrome escapes '>' in path, userinfo, query, host.  It allows '>' unchanged in fragment.  '>' in port causes parsing failure.
* Edge escapes '>' in path and host.  It allows '>' unchanged in fragment and query.  '>' in port causes parsing failure.  Presence of userinfo causes parsing failure no matter what.

Testcase used:

    <pre><script>
      var strs = [
        "http://test>test/foo\\bar",
        "http://a>b@test/foo\\bar",
        "http://test/foo\\bar/#a>b",
        "http://test/foo\\bar/?a=c>d",
        "http://test:2>3/foo\\bar",
        "http://test/foo>bar\\baz",
      ];
      for (var str of strs) {
        var a = document.createElement("a");
        a.setAttribute("href", str);
        var href;
        try {
          href = a.href;
        } catch(e) {
          href = "href getter threw";
        }
        var url;
        try {
          url = (new URL(str).href);
        } catch(e) {
          url = "constructor threw";
        }
        document.writeln(str, " -- ", href, " -- ", url);
      }
    </script>

with the `\\` bits in there a way to tell whether parsing failed in the href case.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/291

Received on Wednesday, 5 April 2017 21:05:22 UTC