Re: [whatwg/url] Unicode normalization could change the structure of a URL (#626)

> So OAuth2 accepts a URL string with Unicode in it and then:
>     1. Parses that URL string into a URL for its own purposes. (And it might terminate here if the URL is not adequate for some reason.)
>     2. Sends back that URL string for another consumer, but now NFKC/D normalized?
> How is that not a bug in OAuth2?

[The OAuth2 standard](https://datatracker.ietf.org/doc/html/rfc6749), as far as I can see, is silent on Unicode normalization forms. I prefer leaving the judgment whether OAuth2 is at fault to others.

> You cannot apply Unicode normalization to all inputs, certainly not URL strings. They should only go into the URL parser.

I18nWG currently recommends applying NFC to _everything,_ which should include URLs. For example, [they say the _entire_ HTML and CSS files should be in NFC.](https://www.w3.org/International/questions/qa-html-css-normalization) If WHATWG disagrees with this, I suggest WHATWG work with I18nWG to reach a consensus.

> I could see trying to disallow `#` and similar code points, but pipelines that do this kind of (bogus) normalization on URL strings would still be susceptible to attacks, depending on when they perform the (bogus) normalization.

That's correct. Personally, I can be satisfied with a mechanism to detect URLs that would break incorrect pipelines; that is, a standardized, step-by-step procedure for security-sensitive applications to follow in case they want to rule out all problematic URLs out of an abundance of caution.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/626#issuecomment-892823617

Received on Wednesday, 4 August 2021 17:05:23 UTC