Re: [whatwg/url] Unicode normalization could change the structure of a URL (#626)

W3C-I18N discussed this in our [teleconference of 2022-08-18](https://www.w3.org/2022/08/18-i18n-minutes.html#t06) and I drew an [action item](https://www.w3.org/International/track/actions/1188) to reply to this thread.

In general we agree with @annevk and @domenic's statements on this thread. If one applies Unicode normalization to a text file representing an HTML or CSS document, that can spoil code point sequences that were deliberately not normalized to start with.

Our [Character Model document on string matching](https://www.w3.org/TR/charmod-norm/#normalizationChoice) and our WG do not blindly recommend NFC (and definitely do not recommend any of the `K` forms) for Web content. We do recommend that content authors choose to use NFC wherever practical for their language, since this promotes interoperability (and since this is what most--but not all--keyboards produce). Our recommendations today are subtly different than they were e.g. 10 years ago: we think NFC is good, but we tell spec writers and implementers not to change the normalization form of content unless the user specifically asks them to do so. 

As a result of this issue, we intend to revise our article https://www.w3.org/International/questions/qa-html-css-normalization in the near future. 

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/626#issuecomment-1221626361
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/url/issues/626/1221626361@github.com>

Received on Sunday, 21 August 2022 21:45:23 UTC