[whatwg/url] Allow `[` and `]` as URL code-points (Issue #753)

### Background

[RFC-3986](https://www.rfc-editor.org/rfc/rfc3986#section-2.2) reserves two kinds of delimiters:
- `gen-delims`, which are used by the URL syntax itself (things like `?` and `#`), and
- `sub-delims`, which are available for use within a URL component to mark out subcomponents. 
     For example, `&` and `=` are in the `sub-delims` set, and they are used by query strings to encode key-value pairs.

It's important that we have a set of known subcomponent delimiters, because clients need the assurance that these characters can be used without escaping. An escaped and unescaped subcomponent delimiter must not be equivalent - for example, percent-escaping a `&` in the query string would merge adjacent key-value pairs and corrupt its meaning.

RFC-3986 also includes `[` and `]` in the `gen-delims` set, and does not allow their use anywhere except IP addresses. That document does not explain why these particular characters are forbidden elsewhere.

Its predecessor, [RFC-2396](https://www.rfc-editor.org/rfc/rfc2396#section-2.4.3), includes these characters in the `unwise` character set, and possibly provides more insight as to why they must be escaped in URLs:

> Other characters are excluded because gateways and other transport
   agents are known to sometimes modify such characters, or they are
   used as delimiters.
>
>   unwise      = "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`"
>
>  Data corresponding to excluded characters **must be escaped** in order to
   be properly represented within a URI.

This escaping means they cannot be used as subcomponent delimiters.

### Problem

Despite the above, the URL standard today allows `[` and `]` to be used unescaped in URL query strings and fragments. They are  not URL code-points, but they are tolerated, and have been for such a long time that an ecosystem has emerged which depends on them being available as subcomponent delimiters.

An example of this is the Javascript [`qs` library](https://github.com/ljharb/qs) (>250m downloads per month), used by popular frameworks such as Express.js. It uses square brackets to denote nesting and arrays in key-value pairs.

```
assert.deepEqual(qs.parse('foo[bar]=baz'), {
    foo: {
        bar: 'baz'
    }
});
```

and arrays:

```
var withArray = qs.parse('a[]=b&a[]=c');
assert.deepEqual(withArray, { a: ['b', 'c'] });
```

Query-strings created by this library will use percent-encoded brackets. This is apparently undesirable though, so they added an option to skip percent-encoding key names, and users unhappy with the escaping are [encouraged](https://github.com/ljharb/qs/issues/388) to use it.

Moreover, the use of an unsanctioned character as a subcomponent delimiter means that brackets in key names [are ambiguous](https://github.com/ljharb/qs/issues/235):

> It uncovers the additional issue that `{ 'foo[bar]': 'baz' }` and `{ foo: { bar: 'baz' } }` both stringify to 'foo%5Bbar%5D=baz'.

And the only way to break this ambiguity would be to say that escaped and unescaped square brackets might not be equivalent, as is the case with all other subcomponent delimiters.

### Proposed Resolution

I believe we should accept that unescaped `[` and `]` are a de-facto part of the web at this point, and include them as valid URL code-points. We already allow them to be used without escaping, and developers have been using them unescaped for some time. 

Historically, there has never been any conflict with other URL components (indeed, IPv6 addresses now use square brackets) - there was only some concern about colliding delimiters when embedding URLs, but that concern seems to apply equally to regular parentheses `()`, which are allowed (and are actually used _specifically as URL delimiters_ in Markdown). Ultimately, the issue of colliding delimiters is an issue for the _embedding_ document to solve, not for the embedded content to attempt to second-guess.

Therefore, IMO, the presence of an unescaped square bracket should not be grounds to call the URL non-valid.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/753
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/url/issues/753@github.com>

Received on Saturday, 11 February 2023 07:23:57 UTC