[whatwg/url] Parsing square brackets ([]) in path, query, and fragment (#595)

It seems that URL parsers in the wild allow square brackets ([]) in path, query, and fragment. On the other hand, it seems that the URL spec says square brackets in path, query, and fragment will cause validation error.

My question is which one is correct:

*   url parsers are correct, the spec should be tweaked
*   the spec is correct, urls parses should be tweaked
*   both are correct (I'm wrong)

My opinion is url parsers are correct though I'm not too sure. Please let me know if I missed something. 

-----

URL parsers in the wild allow square brackets in path, query, and fragment:

```js
new URL('https://example.com/[]?[]#[]'); // doesn't throw
// URL {
//   href: 'https://example.com/[]?[]#[]',
//   origin: 'https://example.com',
//   protocol: 'https:',
//   username: '',
//   password: '',
//   host: 'example.com',
//   hostname: 'example.com',
//   port: '',
//   pathname: '/[]',
//   search: '?[]',
//   searchParams: URLSearchParams { '[]' => '' },
//   hash: '#[]'
// }
```

I tested with Node.js 16 (stable), Firefox 90 (nightly) and Chrome 90 (stable).

-----

The URL spec says square brackets in path, query, and fragment will cause validation error.

In basic URL parser's [path state step 2.](https://url.spec.whatwg.org/#path-state), [query state step 3.](https://url.spec.whatwg.org/#query-state) and [fragment state step 1.](https://url.spec.whatwg.org/#fragment-state):

> *   If c is not a URL code point and not U+0025 (%), validation error.
> *   If c is U+0025 (%) and remaining does not start with two ASCII hex digits, validation error.
> *   UTF-8 percent-encode c using the path percent-encode set and append the result to buffer.

and [URL code point](https://url.spec.whatwg.org/#url-code-points) doesn't contain square brackets (U+005B ([) and U+005D (]).

> The URL code points are ASCII alphanumeric, U+0021 (!), U+0024 ($), U+0026 (&), U+0027 ('), U+0028 LEFT PARENTHESIS, U+0029 RIGHT PARENTHESIS, U+002A (*), U+002B (+), U+002C (,), U+002D (-), U+002E (.), U+002F (/), U+003A (:), U+003B (;), U+003D (=), U+003F (?), U+0040 (@), U+005F (_), U+007E (~), and code points in the range U+00A0 to U+10FFFD, inclusive, excluding surrogates and noncharacters. 


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/595

Received on Wednesday, 28 April 2021 02:29:40 UTC