- From: Larry Masinter <LMM@acm.org>
- Date: Thu, 6 Sep 2018 10:11:38 -0700
- To: "'Mark Birbeck'" <mark.birbeck@gmail.com>
- Cc: "'TAG List'" <www-tag@w3.org>
- Message-ID: <003001d44604$acfd4540$06f7cfc0$@acm.org>
(changing email address; masinter@adobe will stop working soon) It’s somewhat irrelevant whether people actually type in URLs often. The problem is that there are many different things people use with URLs – something one person could “write on a napkin” and that someone could take the napkin home and type it in, and get the same result: that was originally one of the design requirements for URLs. There are many other workflows besides retyping. For example “guess the affiliation of someone by the domain name of their email address”. (I have more than once been given an ID badge listing “ACM” as my affiliation when I use LMM@acm.org <mailto:LMM@acm.org> as my email address.) “Look at a URL and decide whether it matches URLs I’ve seen before.” The problem is that each workflow has slightly different requirements and possible failure modes. Visual comparison is hard to characterize. My claim is that “retyping” has the strongest set of requirements – it encompasses the requirement that the URL actually be displayable (the characters are in the font used), that the Unicode normalization used by the keyboard input mechanism is consistent with that used by the original URL, that there is no use of confusables or even 1 vs l (lower L) vs I (cap i). From: Mark Birbeck <mark.birbeck@gmail.com> Sent: Thursday, September 6, 2018 4:48 AM To: Larry Masinter <masinter@adobe.com> Cc: TAG List <www-tag@w3.org> Subject: Re: Google Wants to Kill the URL Do people really type in URLs? I'd wager that most people go via Google (sorry...I mean they go via "their favourite search engine"). Even your URL in your signature, Larry...I can't actually imagine myself typing that; if I didn't click on it I'd just go search for your name and click the top result! So perhaps the issues raised by John amount to whether some fraudster could convince me that I've found your site: * first by getting their site into your search results; * and then second, after I've clicked the URL by making the site look ok, having certificates, etc. Whichever way you look at it, it doesn't seem to be a problem with URLs per se (unless we think there is a problem with binary and electricity, too), but with the layers that are placed on top of them. On Thu, 6 Sep 2018 at 05:33 Larry Masinter <masinter@adobe.com <mailto:masinter@adobe.com> > wrote: A lot of the problems with URLs have to do with i18n and the difficulty of defining canonical forms that capture the equivalence wanted. Briefly, I thought it might be good to focus on retypeability – when displaying a URL can a user enter it and get the same string? If you expect a person to compare two strings, they are more likely to be able to do so if both are retypeable. Retypeability handles lots of the Unicode problems (normalization of combining character substrings, han unification, emoji, zero-width joiners, etc etc.) Strings that are not retypeable are “confusable”. Confusable strings are generally NOT generated but chosen – a domain name or a path of a URL. Happy to talk more if you like, An interesting approximation to retypeability is to render the string as an image and then OCR the result. Larry -- https://LarryMasinter.net
Received on Thursday, 6 September 2018 17:12:02 UTC