- From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
- Date: Sun, 17 Oct 2010 12:58:55 +0900
- To: John C Klensin <john-ietf@jck.com>
- CC: Wil Tan <wil@dready.org>, Julian Reschke <julian.reschke@gmx.de>, public-iri@w3.org
I have applied the patch that I put out around the end of September, because I have received positive feedback from John, and no negative feedback. I have committed this to subversion as http://trac.tools.ietf.org/wg/iri/trac/changeset/20/. I will produce a -02 draft based on the small but steady changes that have accumulated since draft -01. Any comments appreciated, as always. Regards, Martin. On 2010/09/30 22:31, John C Klensin wrote: > > > --On Tuesday, September 28, 2010 20:27 +0900 "\"Martin J. > Dürst\""<duerst@it.aoyama.ac.jp> wrote: > >>> Actually, 5890/91/92/93 and arguably the still unpublished RFC >>> 5895. RFC 5894 is not normative, but contains the >>> explanations that might be more useful to some people as well >>> as a discussion of the transition issues. >> >> I have added (at first unused) references to RFC 5890 and >> 5891. I have referenced RFC 5890 here. I think it should be >> obvious to the reader of that document that they have to look >> at the others, too. I don't think we want to have a whole list >> of RFCs every time there is something about IDNA, but of >> course if there is something specific regarding one of the >> other documents (e.g. bidi,...), we'll also add a direct >> reference to that. > > FWIW, this works for me. > >>>>> What's the right reference for ToASCII now? >>> >>>> The closest thing would be sections 5.1 to 5.5 of RFC 5891, >> >> Again, we are not looking for the actual operation, but for >> the valitity check it provides. I think that therefore >> U-Labels at >> http://tools.ietf.org/html/rfc5890#section-2.3.2.1 >> are the rigth point to reference. > > Ok. I suggested the operation only because ToASCII was used, > it is definitely an operation, and the question was for a > reference for ToASCII. > >>>> but simply referencing them will lead to incompatibility >>>> (e.g. producing different A-labels from the IDNA2003 >>>> version.) >> >> Does it produce different A-labels? My understanding is that >> it produces either the same A-label or no A-label, with the >> very specific exceptions of the ς (final sigma) and ß >> (sharp-s) only. > > That is correct unless we made a very serious mistake somewhere. > In some sense, the difference between IDNA2003 and IDNA2008 is > that the number of strings that can be processed to produce what > we now call A-labels has decreased significantly. But that is > consistent with your "no A-label" case above. > >> ... >>>> http://unicode.org/reports/tr46/ details a good transition >>>> strategy, but I wonder how one could work that into iri-bis. >>> >>> TR46 (which is not yet a stable reference since the text is >>> still under review and may change yet again), details a >>> transition strategy. But it is one that does not have IETF >>> consensus, partially because it posits a much slower >>> transition to allow for circumstances that are either very >>> low frequency or that represented abuses even under >>> pre-IDNA2008 standards and best practices. Let's not make >>> things more confusing by trying to reference it as if it were >>> the only reasonable approach to the situation. >> >> See Michel's mail for some details. I think we have to look >> into whether and how we can use TR46 for describing additional >> normalization at least in the normalization section (some >> applications such as spiders prefer to normalize as >> aggressively as possible to reduce the possibility of fetching >> the same thing twice). > > Well, RFC 5895 certainly permits normalizing as aggressively as > one likes, it is just quite deliberately not normative. The > difficulty is that, once one moves beyond canonical > normalization (NFC or NFD), and becomes more aggressive, one > starts running into edge cases in which some names that users > and registrants believe are different become the same, > effectively making one of them completely inaccessible. The two > cases that the IDNABIS WG quite deliberately created (final > sigma and sharp-s) after long debate are examples of this, but > so are the notorious dotless-i problem, a number of Han > characters that are safe to map away except when they are used > in personal names (the latter are not PVALID today, but it is > easy to imagine a strong case being made in the future for > reclassifying them), the Arabic and Farsi Yeh character, a > number of characters that represent numerals, and so on. > > For the IRI spec to assume or require aggressive mapping that > goes well beyond the very conservative assumptions of RFC 5895 > (you will recall that much of the relevant descriptive text was > moved into what is now RFC 5894) risks creating disconnects and > inappropriate restrictions on user and registrant behavior and > on the future evolution of IDNA. > > john > > > -- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
Received on Sunday, 17 October 2010 03:59:44 UTC