- From: John C Klensin <klensin@jck.com>
- Date: Tue, 20 Aug 2013 15:33:45 -0400
- To: Marcos Sanz <sanz@denic.de>, Anne van Kesteren <annevk@annevk.nl>
- cc: Shawn Steele <Shawn.Steele@microsoft.com>, public-iri@w3.org, uri@w3.org, Peter Saint-Andre <stpeter@stpeter.im>, Mark Davis ☕ <mark@macchiato.com>, idna-update@alvestrand.no, Vint Cerf <vint@google.com>, "www-tag.w3.org" <www-tag@w3.org>
--On Tuesday, August 20, 2013 15:55 +0200 Marcos Sanz <sanz@denic.de> wrote: > idna-update-bounces@alvestrand.no wrote on 20/08/2013 14:32:23: > >> On Mon, Aug 19, 2013 at 9:32 PM, Shawn Steele >> <Shawn.Steele@microsoft.com> wrote: >> > I concur. We use the IDNA2008 + TR46 behavior. >> >> Interesting. Last I checked Internet Explorer that was not >> the case. > > At this side of the keyboard, ß is still not supported in > IE10/Win7-SP1 But that is completely consistent with IDNA2008 + UTR46 when the most IDOA2003-like profile (or, if you prefer, stage of transition) of UTR46 is used. One can debate endlessly whether UTF46 is a good idea (and the IDNABIS WG did), but ultimately [1] it was intended to provide an environment as much like that of IDNA2003 as possible. That includes: --strict backward compatibility with the interpretation of strings that are valid with either IDNA2003 or IDNA2008 and -- continued support for strings that were valid in IDNA2003 but that mapped into other strings before being converted using ASCII strings using Punycode where those target strings are valid under IDNA2008 If one accepts that kind of compatibility as a primary goal, then the fact that "ß" was mapped to "ss" in IDNA2003 means that mapping must be preserved forever and one will never [2] actually be able to store an Eszett in the DNS. The bottom line, at least IMO, is that one can adopt either of two philosophical models. In one, whatever decisions were made in building the IDNA2003 standard and the name strings those decisions allowed are inviolable. Arguments that errors were made, that those strings create risks, or that the rules prohibit orthographically-reasonable strings are simply irrelevant if they conflict with absolute compatibility. The other(at the risk of showing my biases) is to assume that we are human, that mistakes will get made, and that, if they are significant, we should figure out how to correct them and move on. As others have suggested, the latter includes realizing that some labels and practices that were allowed under IDNA2003 were simply a bad idea and we should move away from them as soon as possible rather than encouraging their use in even more contexts. Coming back to the comment that started this note, it also means that, if the relevant language communities decide, for example, that Eszett is important as a character or that zero-width joiners and non-joiners are critical, we need to figure out how to accommodate them even if the accommodation is not perfect and doesn't solve all problems. And, in each case, we need to remember that the Internet is growing and reaching more communities and more people within almost every community, making transition now, even if painful, much less painful than transition in the future. FWIW, without at least some measure of the latter model, we would be stuck with HTTP 1.0, HTML 1 (or at least 3), and ISO 8859-1 forever. The decision to interpret a string of non-ASCII octets in content as, by default, a good candidate for UTF-8 rather than Latin-1 is, at least IMO, ultimately an incompatible change of far more sweeping impact and consequences than this IDNA2003 -> IDNA2008 transition. In an odd way, while I would have preferred to see a much more rapid transition, I think that exactly what should be happening is happening. The various registries --both the ICANN-supervised ones and many others at the root and various other levels-- are prohibiting (and not renewing) strings that do not conform with IDNA2008. Registries that want to support labels that are problematic from a transition standpoint have devised, or are devising, procedures to lower the odds of strings that pose difficulties falling into hostile hands, just as many of them do for potentially-confusing strings. The right time to transition systems that look up names involves tricky questions including the "pain now or more pain later" considerations mentioned above. And where UTR 46 and/or RFC 5895 fit into transition strategies (as distinct from localized mapping strategies), or not, is obviously part of that transition question. Anne, coming back to your original question, I don't know what question you and your colleagues asked that got the "everyone is still on IDNA2003" answer. Especially given the information from Microsoft, I suspect it was close to "are you fully supporting IDNA2008" for which as "no" answer might lead to a "using IDNA2003" answer despite their telling us that they are running IDNA2008 with UTR 46. Others have pointed out that "IDNA2003 with the version restriction eliminated" may be a sensible statement in individual cases but, because the Nameprep profile of Stringprep is not simply Unicode Case Folding plus NFKC, it leaves enough open to local interpretation that it is not a plausible candidate for a statement in a standard that is intended to promote interoperability. Against that backdrop, I believe you should interpret what you are seeing, not as "everyone is committed to IDNA2003" (obviously not true as soon as exceptions are introduced) and "IDNA2003 with exceptions forever" but as slow transition. If you want a standard that works going forward, make the assumption that the folks who designed IDNA2008 were not fools and that browsers should be moving, and eventually will move (unless you discourage them) in the IDNA2008 direction. Whether you want to discuss transition or not is up to you. If you want to follow Mark's recommendation (and Microsoft's lead) and suggest IDNA2008 plus UTR 46, I suggest you do so in a way that really constitutes a transition strategy rather than an "IDNA 2003 forever" one, i.e., that you address the issues of when "transition processing" gets turned off and the localization issues (especially about case folding) mentioned by others. If not, you and your working group put us all at risk of many internationalized email applications working differently than web browsers do, in a fork between IETF and W3C i18n standards, divergence between assumptions and norms used by those who create DNS names and those who look them up, and so on. I hope we can agree that those would be bad outcomes. regards, john ----------- [1] I hope Mark will more or less agree with this characterization; it is a accurate and neutral as I know how to make it. [2[ This is associated with one of the key criticisms of UTR 46 that has not been discussed so far: It has been described as a transition strategy, but there is really no mechanism in it for deciding when to adopt the IDNA2008 model and rules in favor of strict backward-compatibility with as many names that were valid under IDNA2003 as possible. In reality, saying "we use UTR 46" or "we conform to UTR 46" is somewhat underspecified because UTR 46 can be used strictly for local mapping, with what it calls "transition processing" (which is where Eszett disappears), and/or with other optional features such as flagging, but continuing to look up, strings that contain punctuation or symbol characters. Either of those latter options makes a so-called "IDNA2008 + UTR46" implementation non-conforming with IDNA2008.
Received on Tuesday, 20 August 2013 19:34:25 UTC