- From: John C Klensin <john+w3c@jck.com>
- Date: Thu, 06 Aug 2015 11:48:08 -0400
- To: Andrew Sullivan <ajs@anvilwalrusden.com>, www-international@w3.org
--On Wednesday, August 05, 2015 15:36 -0400 Andrew Sullivan <ajs@anvilwalrusden.com> wrote: >> Long term, if the majority of text in Cherokee is in the >> (new) lowercase, would it be awkward to force them to use the >> uppercase for idns? > > Well, at the moment, IDNA2008 is frozen in pre-Unicode-7 > because of a different problem, so the issue will be academic > until then. It is perhaps worth remembering that we have been through almost the same thing before, ending up in a disagreement between the strong preference of the user community (who were more numerous than speakers of Cherokee) and advocates of what I hope I'm not mischaracterizing as "stability and forward and backward compatibility no matter what". For those who don't know the example, the earlier version of IDNA supported (and required) case mapping, based on the Unicode "language independent case-folding" algorithm. Because there was no upper case version of the character "Sharp S" (Eszett, U+00DF), that mapping process turned it into the common basic Latin representation, "ss", effecting making Eszett unusable in IDNA domain names even though users could successfully type it in many contexts. During the time IDNA2008 was being developed, mandatory mappings were dropped to guarantee an one-one mapping between natural character ("U-label") and ASCII-encoded ("A-label", Punycode-encoded) forms and a code point was assigned to an uppercase representation of Eszett. The latter could have been used to case-fold U+00DF to itself but did not for stability reasons. With considerable guidance (one might even say "pressure") from the German-speaking community including both users and DNS registrars and registries in Germany, the IDNA WG decided to allow Eszett as a permitted character in IDN labels, thereby creating an incompatibility with strings that apparently contained Eszett but where it was mapped to "ss" under IDNA2003 and a consequent transition problem. At least in part because one of the recommendations about how to handle that transition has been widely interpreted as "just don't do it, continue to map Eszett to 'ss' forever", describing that change as "awkward" or "disruptive" would probably understatements. > But a major change to case folding behaviour between Unicode > versions would be pretty disruptive to any identifier system, > yeah. Given the above difficulties caused by a single character change, the consequences of a change for an entire script if the same pattern were repeated are hard to contemplate. While I hope we can do better, the odds are that the same process would play out: the Cherokee user community would want lower-case for consistency with familiar patterns and everyone else, the DNS community would be likely to listen to the demands of their likely customers (the Cherokee user community and those trying to appeal to them) and would note that the number of present registrations in Cherokee is quite low relative to their projections and expectations), and the parts of the web browser and developer communities who believe in absolute stability would apply that view. So, "pretty disruptive" indeed. john
Received on Thursday, 6 August 2015 15:48:39 UTC