Re: [whatwg/url] Unicode normalization could change the structure of a URL (#626)

@r12a Thank you! I really like the improvement. Some nitpicking:

1. For the question "What are normalization forms?" the text did not say much about K. Personally I feel we could explain what it is and why it's bad, or outrightly discommend it (as in the answer to the other question, maybe with a link for interested readers).

2. In the question "Converting the normalization form of a page", I personally want to emphasize the _structures_ of URLs, HTML documents, and many others (not just individual components in them) could change. Notably, the following three cases are problematic for conversion between NFC and NFD:
   - \u2260 (≠ as one code point in NFC) and =\u0338 (≠ as two code points in NFD)
   - \u226E (≮ as one code point in NFC) and <\u0338 (≮ as two code points in NFD)
   - \u226F (≯ as one code point in NFC) and >\u0338 (≯ as two code points in NFD)

   These cases mean that <, >, = could appear or disappear due to the conversation between NFC and NFD, and they play an important role in too many web standards. An application that does any automatic conversion is subject to injection attacks, and such attacks have happened before. This is why I started this GitHub thread, and I personally still believe the standards should ban these problematic sequences instead of telling application authors to be careful. Given that we decided to update the FAQ, I hope at least the FAQ could emphasize the dire consequence of not following the advice.

In any case, thanks for your great work!

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/626#issuecomment-1225023964

You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/url/issues/626/1225023964@github.com>

Received on Wednesday, 24 August 2022 00:30:12 UTC