Re: [whatwg/url] ContextJ (RFC 5892) is Security Theater (Issue #776)

* There are 3K+ RGI emoji and 1/3 of them involve ZWJ sequences.    CheckJoiners exchanges few exotic characters (that can easily be enforced at the registrar level) for 1350 emoji sequences that are used internationally by billions of people.

* RFC 5892 is both outdated (2010) and misguided.  AFAICT it's trying to allow ZW(N)J for typographical reasons yet I don't think there's any ambiguity with or without a joiner.  
    * Are there any registrars that allow both virama with and without ZWNJ as separate names (no)
    * How many actual domains benefit from this rule?

* If you look across the internet, there are thousands of developer hours wasted on deciding these choices one way or another, but at the end of the day, CheckJoiners is just a convoluted way to disallow `200C` and `200D`.

---

For a concrete example: `1F468 200D 1F4BB`
![image](https://github.com/whatwg/url/assets/225900/91845412-aebd-42ce-a9e9-c41e8549ff9b)

* This emoji was released in 2016 (7 years ago)
* Major browsers don't agree on it's validity:  Compare Chrome/Brave vs Safari/Firefox
* The [punycode](https://adraffy.github.io/punycode.js/test/demo.html#u=%F0%9F%91%A8%E2%80%8D%F0%9F%92%BB) of this emoji is `xn--1ugz855pfha`
* This emoji is invalid with CheckJoiners.
* In some browsers, this encodes as `xn--qq8hgf` which is [wrong](https://adraffy.github.io/punycode.js/test/demo.html#p=xn--qq8hgf) — `1F468 1F4BB` is not the same as  `1F468 200D 1F4BB`
* NodeJS recently switched to [Ada](https://github.com/ada-url/ada) which uses WHATWG.  This means that even if you correctly punycode the domain, a WHATWG URL implementation will prevent its use, even though the punycode is valid and the domain is DNS compatible.
![image](https://github.com/whatwg/url/assets/225900/cf302aea-5df2-46d5-9fc1-3675762b4ef4)
* In general, the validity of URLs seems to change randomly between browser releases as libraries are periodically replaced and the standards aren't clear.  
---

**The simplest solution is that `CheckJoiners` should be `false`**  
* Any name with a joiner is already punycode.  
* UTS-46 provides poor guidance regarding spoofs and confusables and has forced developers to implement various parts of UAX-39 and their own logic to decide when to display punycode as Unicode.   
* UTS-46 advice about validating punycode is also strange because name validity is a registrar problem, not a resolution problem.  
* This is a disaster for the end-user because the rules are constantly changing, yet at the same time, there are thousands confusables and mixed scripted spoofs that slip right through the implemented standards.

---

For reference, I recently implemented a [normalization standard](https://github.com/adraffy/ensip-15/blob/master/ens-improvement-proposals/ensip-15-normalization-standard.md) for the [Ethereum Name Service](https://ens.domains/) ecosystem.  I used a combination of UTS-51 + UTS-46 + significantly safer character set (banned punctuation, parens, brackets, vocalizations, obsolete, deprecated, ancient, reversed, turned, flipped, many ligatures, etc.) + an intelligent confusable system (that isn't just a warning system: eg. `rn` is a footgun confusable.)  [Demo](https://adraffy.github.io/ens-normalize.js/test/resolver.html) | [Github](https://github.com/adraffy/ens-normalize.js)

From my experience with the Unicode and RFC documentation, **the primary source of confusion and bugs is due to the documentation itself.**  Many of these rules should be deprecated and the rules should be clarified and modernized.

I think WHATWG made the correct decision with `AllowHyphens` and finally broke away from archaic DNS rules.

I think they should do the same with `CheckJoiners`.  If the WHATWG wants really wants to protect end-users, it should recommend UTS-51 RGI pre-processing and outright disallow ZW(N)J outside of emoji.


-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/776#issuecomment-1592565775
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/url/issues/776/1592565775@github.com>

Received on Thursday, 15 June 2023 08:06:22 UTC