- From: John C Klensin <john-ietf@jck.com>
- Date: Thu, 30 Sep 2010 09:31:57 -0400
- To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
- cc: Wil Tan <wil@dready.org>, Julian Reschke <julian.reschke@gmx.de>, public-iri@w3.org
--On Tuesday, September 28, 2010 20:27 +0900 "\"Martin J. Dürst\"" <duerst@it.aoyama.ac.jp> wrote: >> Actually, 5890/91/92/93 and arguably the still unpublished RFC >> 5895. RFC 5894 is not normative, but contains the >> explanations that might be more useful to some people as well >> as a discussion of the transition issues. > > I have added (at first unused) references to RFC 5890 and > 5891. I have referenced RFC 5890 here. I think it should be > obvious to the reader of that document that they have to look > at the others, too. I don't think we want to have a whole list > of RFCs every time there is something about IDNA, but of > course if there is something specific regarding one of the > other documents (e.g. bidi,...), we'll also add a direct > reference to that. FWIW, this works for me. >>>> What's the right reference for ToASCII now? >> >>> The closest thing would be sections 5.1 to 5.5 of RFC 5891, > > Again, we are not looking for the actual operation, but for > the valitity check it provides. I think that therefore > U-Labels at > http://tools.ietf.org/html/rfc5890#section-2.3.2.1 > are the rigth point to reference. Ok. I suggested the operation only because ToASCII was used, it is definitely an operation, and the question was for a reference for ToASCII. >>> but simply referencing them will lead to incompatibility >>> (e.g. producing different A-labels from the IDNA2003 >>> version.) > > Does it produce different A-labels? My understanding is that > it produces either the same A-label or no A-label, with the > very specific exceptions of the ς (final sigma) and ß > (sharp-s) only. That is correct unless we made a very serious mistake somewhere. In some sense, the difference between IDNA2003 and IDNA2008 is that the number of strings that can be processed to produce what we now call A-labels has decreased significantly. But that is consistent with your "no A-label" case above. >... >>> http://unicode.org/reports/tr46/ details a good transition >>> strategy, but I wonder how one could work that into iri-bis. >> >> TR46 (which is not yet a stable reference since the text is >> still under review and may change yet again), details a >> transition strategy. But it is one that does not have IETF >> consensus, partially because it posits a much slower >> transition to allow for circumstances that are either very >> low frequency or that represented abuses even under >> pre-IDNA2008 standards and best practices. Let's not make >> things more confusing by trying to reference it as if it were >> the only reasonable approach to the situation. > > See Michel's mail for some details. I think we have to look > into whether and how we can use TR46 for describing additional > normalization at least in the normalization section (some > applications such as spiders prefer to normalize as > aggressively as possible to reduce the possibility of fetching > the same thing twice). Well, RFC 5895 certainly permits normalizing as aggressively as one likes, it is just quite deliberately not normative. The difficulty is that, once one moves beyond canonical normalization (NFC or NFD), and becomes more aggressive, one starts running into edge cases in which some names that users and registrants believe are different become the same, effectively making one of them completely inaccessible. The two cases that the IDNABIS WG quite deliberately created (final sigma and sharp-s) after long debate are examples of this, but so are the notorious dotless-i problem, a number of Han characters that are safe to map away except when they are used in personal names (the latter are not PVALID today, but it is easy to imagine a strong case being made in the future for reclassifying them), the Arabic and Farsi Yeh character, a number of characters that represent numerals, and so on. For the IRI spec to assume or require aggressive mapping that goes well beyond the very conservative assumptions of RFC 5895 (you will recall that much of the relevant descriptive text was moved into what is now RFC 5894) risks creating disconnects and inappropriate restrictions on user and registrant behavior and on the future evolution of IDNA. john
Received on Thursday, 30 September 2010 13:33:12 UTC