- From: Markus Scherer <notifications@github.com>
- Date: Tue, 17 Feb 2026 16:41:15 -0800
- To: whatwg/url <url@noreply.github.com>
- Cc: Subscribed <subscribed@noreply.github.com>
- Message-ID: <whatwg/url/issues/543/3917830943@github.com>
markusicu left a comment (whatwg/url#543)
Finally looking at this discussion here vs. UTS 46...
**First**, I agree that the wording of the [IDNA2008 bidi rule](https://datatracker.ietf.org/doc/html/rfc5893#section-2) likely resulted in the [UTS 46 criterion](https://unicode.org/reports/tr46/#Validity_Criteria):
1. βThe following rule, consisting of six conditions, _applies to labels in Bidi domain names_.β (my emphasis)
2. β... if the domain name is a Bidi domain name, then the label must satisfy all six of the numbered conditions ...β
I see in this discussion that Harald didn't intend to have the rule be interpreted like this for a whole domain name.
**Second**, I agree that we need to apply the bidi rule on Unicode labels, that is, after Punycode decoding. In UTS 46, the Validity Criteria clearly operate on Unicode labels; I think that's why we apply CheckBidi there.
By contrast, one can read the UTS 46 main processing step 4 as having a Unicode label only temporarily while validating an xn--something label, so the input to an additional step 5 would be ambiguous. In a future UTS 46 update, I will try to clarify this step 4 and how it interacts with the ToASCII and ToUnicode operations.
Wherever we place it in the UTS 46 spec...
**Third**, I think I more or less understand what you are trying to do with the proposed change. However: (a) Several of us find the proposed CheckBidi language convoluted, and thus error-prone for guiding implementers. And (b) it also seems narrower, due to the special case for LDH labels, than it should need to be for some LTR labels. But I could be wrong.
I am trying to come up with what I hope is an easier-to-understand version that broadens the validity beyond ASCII characters (e.g., Persian digits, non-ASCII LTR letters).
Attempt:
1. Any RTL label must satisfy the bidi rule conditions. (Really the first four.)
2. Any label that immediately follows an RTL label must satisfy the bidi rule conditions.
1. Note: Once we find an LTR label that satisfies the bidi rule, any following LTR label should be fine even if it does not satisfy the bidi rule. Right?
2. Note: I think we could even weaken/broaden this further by saying that if an RTL label is followed by an LTR label then the LTR label must start with Bidi_Class=L -- rather than having to satisfy all of the LTR bidi rules.
3. Labels that precede an RTL label must satisfy... something.
I would like the last part, the criteria for _LTR labels before an RTL label_, to be fairly simple, and also fairly broad -- hopefully beyond LDH labels.
Idea for that last part:
- Any RTL label must be preceded by zero or more labels containing only bc=EN characters, and those all-digit labels must be
- either at the start of the domain name
- or preceded by a label that satisfies the bidi rule conditions
I think this is a superset of your proposal for labels surrounding an RTL label, it allows more than LDH in such a label sequence, and it removes bidi rule requirements from labels outside of such a sequence.
If we can agree on criteria, then we should be able to phrase them in a spiffy way, without questions/options/notes :-)
WDYT?
cc @macchiati @asmusf @roozbehp
--
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/543#issuecomment-3917830943
You are receiving this because you are subscribed to this thread.
Message ID: <whatwg/url/issues/543/3917830943@github.com>
Received on Wednesday, 18 February 2026 00:41:19 UTC