Re: [whatwg/url] Refusing a mix of numeric-only and BIDI domains (#543)

markusicu left a comment (whatwg/url#543)

Finally looking at this discussion here vs. UTS 46...

**First**, I agree that the wording of the [IDNA2008 bidi rule](https://datatracker.ietf.org/doc/html/rfc5893#section-2) likely resulted in the [UTS 46 criterion](https://unicode.org/reports/tr46/#Validity_Criteria):
1. β€œThe following rule, consisting of six conditions, _applies to labels in Bidi domain names_.” (my emphasis)
2. β€œ... if the domain name is a Bidi domain name, then the label must satisfy all six of the numbered conditions ...”

I see in this discussion that Harald didn't intend to have the rule be interpreted like this for a whole domain name.

**Second**, I agree that we need to apply the bidi rule on Unicode labels, that is, after Punycode decoding. In UTS 46, the Validity Criteria clearly operate on Unicode labels; I think that's why we apply CheckBidi there.

By contrast, one can read the UTS 46 main processing step 4 as having a Unicode label only temporarily while validating an xn--something label, so the input to an additional step 5 would be ambiguous. In a future UTS 46 update, I will try to clarify this step 4 and how it interacts with the ToASCII and ToUnicode operations.

Wherever we place it in the UTS 46 spec...

**Third**, I think I more or less understand what you are trying to do with the proposed change. However: (a) Several of us find the proposed CheckBidi language convoluted, and thus error-prone for guiding implementers. And (b) it also seems narrower, due to the special case for LDH labels, than it should need to be for some LTR labels. But I could be wrong.

I am trying to come up with what I hope is an easier-to-understand version that broadens the validity beyond ASCII characters (e.g., Persian digits, non-ASCII LTR letters).

Attempt:
1. Any RTL label must satisfy the bidi rule conditions. (Really the first four.)
2. Any label that immediately follows an RTL label must satisfy the bidi rule conditions.
    1. Note: Once we find an LTR label that satisfies the bidi rule, any following LTR label should be fine even if it does not satisfy the bidi rule. Right?
    2. Note: I think we could even weaken/broaden this further by saying that if an RTL label is followed by an LTR label then the LTR label must start with Bidi_Class=L -- rather than having to satisfy all of the LTR bidi rules.
3. Labels that precede an RTL label must satisfy... something.

I would like the last part, the criteria for _LTR labels before an RTL label_, to be fairly simple, and also fairly broad -- hopefully beyond LDH labels.

Idea for that last part:
- Any RTL label must be preceded by zero or more labels containing only bc=EN characters, and those all-digit labels must be
    - either at the start of the domain name
    - or preceded by a label that satisfies the bidi rule conditions

I think this is a superset of your proposal for labels surrounding an RTL label, it allows more than LDH in such a label sequence, and it removes bidi rule requirements from labels outside of such a sequence.

If we can agree on criteria, then we should be able to phrase them in a spiffy way, without questions/options/notes :-)

WDYT?

cc @macchiati @asmusf @roozbehp

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/543#issuecomment-3917830943
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/url/issues/543/3917830943@github.com>

Received on Wednesday, 18 February 2026 00:41:19 UTC