- From: Gervase Markham <gerv@mozilla.org>
- Date: Thu, 22 Aug 2013 12:02:23 +0100
- To: Anne van Kesteren <annevk@annevk.nl>
- CC: Mark Davis ☕ <mark@macchiato.com>, Shawn Steele <Shawn.Steele@microsoft.com>, IDNA update work <idna-update@alvestrand.no>, "PUBLIC-IRI@W3.ORG" <public-iri@w3.org>, "uri@w3.org" <uri@w3.org>, John C Klensin <klensin@jck.com>, Peter Saint-Andre <stpeter@stpeter.im>, Marcos Sanz <sanz@denic.de>, Vint Cerf <vint@google.com>, "www-tag.w3.org" <www-tag@w3.org>
On 22/08/13 11:37, Anne van Kesteren wrote: >> Shame for them. The writing has been on the wall here for long enough >> that they should not be at all surprised when this stops working. > > I don't think that's at all true. I doubt anyone realizes this. I > certainly didn't until I put long hours into investigating the IDNA > situation. It's not been possible to register names like ☺☺☺.com for some time now; that's a big clue. The fact that Firefox (and other browsers, AFAIAA) refuses to render such names as Unicode is another one. (Are your friends really using http://xn--74h.example.com/ ?) Those two things, plus the difficulty of typing such names, means that their use is going to be pretty limited. (Even the guy who is trying to flog http://xn--19g.com/ , and is doing so on the basis of the fact that this particular one is actually easy to type on some computers, has not in the past few years managed to find a "Macintosh company with a vision" to take it off his hands.) > Furthermore, we generally preserve compatibility on the web so URLs > and documents remain working. > http://www.w3.org/Provider/Style/URI.html It's one of the more > important parts of this platform. (The domain name system is about more than just the web.) IIRC, we must have broken a load of URLs when we decided that %-encoding in URLs should always be interpreted as UTF-8 (in RFC 3986), whereas beforehand it depended on the charset of the page or form producing the link. Why did we do that? Because the new way was better for the future, and some breakage was acceptable to attain that goal. So what is the justification for removal of non-letter characters? Reduction of attack surface. When characters are divided into scripts, we can enforce no-script-mixing rules to keep the number of possible spoofs, lookalikes and substitutions tractable for humans to reason about in the case of a particular TLD and its allowed characters. If we allowed 3,254 extra random glyphs in every TLD, this would not be so. Gerv
Received on Thursday, 22 August 2013 11:02:58 UTC