[whatwg/url] Should we forbid U+226E (≮) and U+226F (≯) in hosts? (Issue #733)

From https://www.unicode.org/reports/tr46/#UseSTD3ASCIIRules:


> There are a very small number of non-ASCII characters with the data file status disallowed_STD3_valid:
>
> U+2260 ( ≠ ) NOT EQUAL TO
> U+226E ( ≮ ) NOT LESS-THAN
> U+226F ( ≯ ) NOT GREATER-THAN
>
> Those characters are disallowed with UseSTD3ASCIIRules=true because the set of characters in their canonical decompositions are not entirely in the valid set ([Step 7](https://www.unicode.org/reports/tr46/#TableDerivationStep7) of the Table Derivation). However, they are allowed with UseSTD3ASCIIRules=false, because the base characters of their canonical decompositions, U+003D ( = ) EQUALS SIGN, U+003C ( < ) LESS-THAN SIGN, and U+003E ( > ) GREATER-THAN SIGN, are each valid under that option. If an implementation uses UseSTD3ASCIIRules=false but disallows any of these three ASCII characters, then it must also disallow the corresponding precomposed character for its negation.

We allow `=`, but `<` and `>` are forbidden. All of the three non-ASCII code points listed above work fine in WebKit and I personally might not see the problem as strongly as UTS46 does. I added tests for them in https://github.com/web-platform-tests/wpt/pull/37907. (The tests reflect the status quo.)

Thoughts?

cc @karwa @ricea @achristensen07 @valenting

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/733

You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/url/issues/733@github.com>

Received on Thursday, 12 January 2023 16:29:47 UTC