Re: [whatwg/url] Explain why valid domain needs to run ToUnicode (Issue #817)

Glad I saw this as I too am skeptical about the need to perform the domain-to-unicode algorithm. I've tried generating inputs that fail on step 3 using the below code in Rust using the [`idna`](https://docs.rs/idna/latest/idna/) crate, but I have been unable to find such an input:

```rust
use idna::Config;
fn main() {
    let mut input = String::with_capacity(8);
    for i in 0..=u32::MAX {
        if let Some(uni) = char::from_u32(i) {
            input.clear();
            input.push(uni);
            if let Err(val) = idna_transform(input.as_str()) {
                println!("{val}");
                return;
            }
            input.pop();
            input.push_str("xn--");
            input.push(uni);
            if let Err(val) = idna_transform(input.as_str()) {
                println!("{val}");
                return;
            }
        }
    }
}
fn idna_transform(input: &str) -> Result<(), &str> {
    idna::domain_to_ascii_strict(input).map_or_else(
        |_| Ok(()),
        |ascii| {
            Config::default()
                .use_std3_ascii_rules(true)
                .to_unicode(ascii.as_str())
                .1
                .map_or_else(|_| Err(input), Ok)
        },
    )
}
```

Consequently I believe steps 3 and 4 can be removed, but I haven't mathematically proven the domain-to-ascii algorithm is sufficient. I've used [these examples](https://www.unicode.org/reports/tr46/#Table_Example_Processing) as well.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/817#issuecomment-2078008742
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/url/issues/817/2078008742@github.com>

Received on Thursday, 25 April 2024 19:17:32 UTC