- From: Larry Masinter <masinter@adobe.com>
- Date: Thu, 6 Sep 2018 04:29:50 +0000
- To: TAG List <www-tag@w3.org>
Received on Thursday, 6 September 2018 04:30:15 UTC
A lot of the problems with URLs have to do with i18n and the difficulty of defining canonical forms that capture the equivalence wanted. Briefly, I thought it might be good to focus on retypeability - when displaying a URL can a user enter it and get the same string? If you expect a person to compare two strings, they are more likely to be able to do so if both are retypeable. Retypeability handles lots of the Unicode problems (normalization of combining character substrings, han unification, emoji, zero-width joiners, etc etc.) Strings that are not retypeable are "confusable". Confusable strings are generally NOT generated but chosen - a domain name or a path of a URL. Happy to talk more if you like, An interesting approximation to retypeability is to render the string as an image and then OCR the result. Larry -- https://LarryMasinter.net
Received on Thursday, 6 September 2018 04:30:15 UTC