Re: [whatwg/url] Unicode normalization could change the structure of a URL (#626) from Addison Phillips on 2022-08-21 (public-webapps-github@w3.org from August 2022)

From: Addison Phillips <notifications@github.com>
Date: Sun, 21 Aug 2022 14:45:10 -0700
To: whatwg/url <url@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <whatwg/url/issues/626/1221626361@github.com>

W3C-I18N discussed this in our [teleconference of 2022-08-18](https://www.w3.org/2022/08/18-i18n-minutes.html#t06) and I drew an [action item](https://www.w3.org/International/track/actions/1188) to reply to this thread.

In general we agree with @annevk and @domenic's statements on this thread. If one applies Unicode normalization to a text file representing an HTML or CSS document, that can spoil code point sequences that were deliberately not normalized to start with.

Our [Character Model document on string matching](https://www.w3.org/TR/charmod-norm/#normalizationChoice) and our WG do not blindly recommend NFC (and definitely do not recommend any of the `K` forms) for Web content. We do recommend that content authors choose to use NFC wherever practical for their language, since this promotes interoperability (and since this is what most--but not all--keyboards produce). Our recommendations today are subtly different than they were e.g. 10 years ago: we think NFC is good, but we tell spec writers and implementers not to change the normalization form of content unless the user specifically asks them to do so.

As a result of this issue, we intend to revise our article https://www.w3.org/International/questions/qa-html-css-normalization in the near future.

--
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/626#issuecomment-1221626361
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/url/issues/626/1221626361@github.com>

Received on Sunday, 21 August 2022 21:45:23 UTC