W3C home > Mailing lists > Public > public-iri@w3.org > September 2009

Re: Definitions limit on label length in UTF-8

From: John C Klensin <klensin@jck.com>
Date: Thu, 17 Sep 2009 09:57:50 -0400
To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
cc: idna-update@alvestrand.no, dthaler@microsoft.com, public-iri@w3.org, Stuart Cheshire <cheshire@apple.com>
Message-ID: <3277690B814B78255FBC0AA6@PST.JCK.COM>
Martin,

First of all, please understand that I'm much more agnostic on
this issue than I think you assume.  I'm trying to reflect what
I believe I've been told by the WG and by various other
communities on the subject but, if the WG says "change it", I
will do so as editor and lose very little sleep about the
subject.

I'll let Dave and Stuart address the API and eventual migration
to pure UTF-8 issues.  I've been told that the ability to
convert to length-value form (with a six-bit length) _before_
Punycode conversion (or in an IDNA-unaware, "octets only"
implementation) is critical for the DNS community and for some
security-related applications which store DNS-based identifiers
in that form.  But I have no personal implementation experience
in either area, so perhaps Andrew and Paul can either speak to
those issues or point us to someone who can.

As a sometime-implementer, I'm nervous about unlimited-length
strings (as, based on recent interactions, are Stuart and Vint).
But it seems to me that the string length here is bounded in any
event -- with 59 characters of Punycode in an A-label, the upper
limit on a UTF-8 or UTF-32 string cannot be over 236 characters
and, I assume, would be considerably smaller.  Especially if we
can pin that number down (Adam?), I'd be a lot happier with text
that said, essentially, "the limit is on the A-label string, but
implementations should be aware that a maximum-length A-label
can convert to a U-label of up to NNN" characters than saying
"unlimited" and I think some others would be too.

All of that said, I'm not persuaded by the "there have been no
issues raised, therefore there is no problem" argument.  The
reality is that, for mnemonic and typing convenience, people
generally prefer shorter labels to longer ones.  Other than in
test demonstrations and as part of efforts to encode other types
of information in DNS labels, I don't believe I've ever seen a
60+ character ASCII label in the wild.  Regardless of script, a
few such labels in the same FQDN would not only be nearly
impossible for most people to enter correctly but also would
guarantee line-wrapping of DNS names in most screen-layout and
documentation arrangements... never an ideal situation.   That
isn't an argument for banning labels of that length or longer;
it does suggest a reason why no problems have been identified
other than "people have been using this for years with no
difficulty".

regards,
    john



--On Saturday, September 12, 2009 12:14 +0900 "\"Martin J.
Dürst\"" <duerst@it.aoyama.ac.jp> wrote:

> Hello John,
> 
> [Dave, this is Cc'ed to you because of some discussion
> relating to draft-iab-idn-encoding-00.txt.]
> 
> [I'm also cc'ing public-iri@w3.org because of the IRI-related
> issue at the end.]
> 
> [Everybody, please remove the Cc fields when they are
> unnecessary.]
> 
> 
> Overall, I'm afraid that on this issue, more convoluted
> explanations won't convince me nor anybody else, but I'll
> nevertheless try to answer your discussion below
> point-by-point.
> 
> What I (and I guess others on this list) really would like to
> know is whether you have any CONCRETE reports or evidence
> regarding problems with IDN labels that are longer than 63
> octets when expressed in UTF-8.
> 
> Otherwise, Michel has put it much better than me: "given the
> lack of issues with IDNA2003 on that specific topic there are
> no reasons to introduce an incompatible change".
Received on Thursday, 17 September 2009 13:58:05 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:39:40 UTC