- From: Andrew Sullivan <ajs@anvilwalrusden.com>
- Date: Tue, 20 Aug 2013 12:06:20 -0400
- To: Anne van Kesteren <annevk@annevk.nl>
- Cc: Mark Davis ? <mark@macchiato.com>, Shawn Steele <Shawn.Steele@microsoft.com>, Vint Cerf <vint@google.com>, "public-iri@w3.org" <public-iri@w3.org>, "uri@w3.org" <uri@w3.org>, "idna-update@alvestrand.no" <idna-update@alvestrand.no>, "www-tag.w3.org" <www-tag@w3.org>, Peter Saint-Andre <stpeter@stpeter.im>
I'm pretty sure I'm not on many of these lists, so I bet this mail won't go everywhere. Nevertheless, On Tue, Aug 20, 2013 at 01:32:23PM +0100, Anne van Kesteren wrote: > (Aside: ToASCII in IDNA2003 applies to domain labels. It applying to > domain names in UTS #46 is somewhat confusing.) Or "broken". It can't apply to domain names, of course, because that's not how the DNS works; but one might be forgiven for wondering whether not understanding the details of an underlying technical problem is a barrier to having an opinion in this space. > I don't think the committee has carefully considered the compatibility > impact. Deployed domains would become invalid. The IDNABIS wg did not take that decision lightly. In my opinion, we concluded that some deployed domains were just _broken_, and that we were eventually going to endure this pain, and that it would be better to do it earlier rather than later. > Long-standing practice > of case folding (e.g. the idea that http://EXAMPLE.COM/ and > http://example.com/ are identical) is suddenly something that is no > longer decided upon by IDNA but needs to be decided somehow at the > application-level. Well, sort of. There's nothing in IDNA2008 that prevents the OS from providing a generic facility for this (which is apparently what the current generation of Windows does). The point was to take this mapping out of the _protocol_ and put it into local rules that could be made locale-sensitive. The reason for this is that, while it is impossible in general to provide case folding rules where lower-case accented characters get mapped to upper case without accents and then get case folded again (thereby losing data), it _might_ be possible to do this in a locale-sensitive way if one knew enough about the environment. For instance, in some writing systems for French, it is standard practice to fold LATIN SMALL LETTER E WITH ACUTE to LATIN CAPITAL LETTER E (not all French systems, of course. Some fold to LATIN CAPITAL LETTER E WITH ACUTE). Now, if the LATIN CAPITAL LETTER E is next downcased, what should you get? The general rule will of course be LATIN SMALL LETTER E, but if you had a clever program that could do intellingent things with the string "ECOLE", the folding might be LATIN SMALL LETTER E WITH ACUTE, or the folding might try both and see what happens. This example is a little contrived -- the French example seems silly -- but examples in other scripts and languages are in my view considerably more compelling. I don't think that UTS#46 is actually different in this regard, although it proposes uniform mapping rules in all cases. IDNA2003 doesn't handle this case real well, because it can't possibly. There's simply no room for locale in IDNA2003. > And when the Unicode consortium provided such > profiling for applications in the form of > http://unicode.org/reports/tr46/ that was frowned upon. I think the history us a little more complicated than that. Best regards, A -- Andrew Sullivan ajs@anvilwalrusden.com
Received on Tuesday, 20 August 2013 16:06:49 UTC