[whatwg/url] Add a note that `searchParams` is not unicode-aware (#596) from Karl on 2021-04-29 (public-webapps-github@w3.org from April 2021)

From: Karl <notifications@github.com>
Date: Thu, 29 Apr 2021 12:10:22 -0700
To: whatwg/url <url@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <whatwg/url/issues/596@github.com>

For example, in the URL:

```
http://example.com?ab\u006E\u0303cd=123
```

The key of the first query parameter is the string "abñcd", where the "ñ" is written as U+006E (LATIN LOWERCASE N) followed by U+0303 (COMBINING TILDE). If you search for exactly those unicode code-points, you will find the value:

```
> url.searchParams.get("ab\u006E\u0303cd")
"123"
```

However, if you search for the canonically-equivalent U+00F1 (LATIN SMALL LETTER N WITH TILDE), you won't find anything:

```
> url.searchParams.get("ab\u00F1cd")
null
```

This means that the sender and receiver of the request need to co-ordinate about how they normalise their unicode code-points before interacting with the URL APIs, otherwise they may get surprising results on platforms whose native string types do unicode-aware comparison by default: `searchParams.get()` will say that no key matching "ab\u00F1cd" exists, but if they iterate all key-value pairs and check manually using the platform's native string type, there will indeed be a matching key.

Would an editorial note be welcome? If so, I'd be happy to add one.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/596

Received on Thursday, 29 April 2021 19:10:34 UTC