- From: John C Klensin <klensin@jck.com>
- Date: Thu, 17 Sep 2009 09:57:50 -0400
- To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
- cc: idna-update@alvestrand.no, dthaler@microsoft.com, public-iri@w3.org, Stuart Cheshire <cheshire@apple.com>
Martin, First of all, please understand that I'm much more agnostic on this issue than I think you assume. I'm trying to reflect what I believe I've been told by the WG and by various other communities on the subject but, if the WG says "change it", I will do so as editor and lose very little sleep about the subject. I'll let Dave and Stuart address the API and eventual migration to pure UTF-8 issues. I've been told that the ability to convert to length-value form (with a six-bit length) _before_ Punycode conversion (or in an IDNA-unaware, "octets only" implementation) is critical for the DNS community and for some security-related applications which store DNS-based identifiers in that form. But I have no personal implementation experience in either area, so perhaps Andrew and Paul can either speak to those issues or point us to someone who can. As a sometime-implementer, I'm nervous about unlimited-length strings (as, based on recent interactions, are Stuart and Vint). But it seems to me that the string length here is bounded in any event -- with 59 characters of Punycode in an A-label, the upper limit on a UTF-8 or UTF-32 string cannot be over 236 characters and, I assume, would be considerably smaller. Especially if we can pin that number down (Adam?), I'd be a lot happier with text that said, essentially, "the limit is on the A-label string, but implementations should be aware that a maximum-length A-label can convert to a U-label of up to NNN" characters than saying "unlimited" and I think some others would be too. All of that said, I'm not persuaded by the "there have been no issues raised, therefore there is no problem" argument. The reality is that, for mnemonic and typing convenience, people generally prefer shorter labels to longer ones. Other than in test demonstrations and as part of efforts to encode other types of information in DNS labels, I don't believe I've ever seen a 60+ character ASCII label in the wild. Regardless of script, a few such labels in the same FQDN would not only be nearly impossible for most people to enter correctly but also would guarantee line-wrapping of DNS names in most screen-layout and documentation arrangements... never an ideal situation. That isn't an argument for banning labels of that length or longer; it does suggest a reason why no problems have been identified other than "people have been using this for years with no difficulty". regards, john --On Saturday, September 12, 2009 12:14 +0900 "\"Martin J. Dürst\"" <duerst@it.aoyama.ac.jp> wrote: > Hello John, > > [Dave, this is Cc'ed to you because of some discussion > relating to draft-iab-idn-encoding-00.txt.] > > [I'm also cc'ing public-iri@w3.org because of the IRI-related > issue at the end.] > > [Everybody, please remove the Cc fields when they are > unnecessary.] > > > Overall, I'm afraid that on this issue, more convoluted > explanations won't convince me nor anybody else, but I'll > nevertheless try to answer your discussion below > point-by-point. > > What I (and I guess others on this list) really would like to > know is whether you have any CONCRETE reports or evidence > regarding problems with IDN labels that are longer than 63 > octets when expressed in UTF-8. > > Otherwise, Michel has put it much better than me: "given the > lack of issues with IDNA2003 on that specific topic there are > no reasons to introduce an incompatible change".
Received on Thursday, 17 September 2009 13:58:05 UTC