RE: 255 character limit in reg-name

From: Martin Duerst <duerst@w3.org>
Date: Sat, 17 Jul 2004 11:08:01 +0900
Message-Id: <>
To: Larry Masinter <LMM@acm.org>, "'Dave McAlpin'" <Dave.McAlpin@epok.net>, "'Roy T. Fielding'" <fielding@gbiv.com>
Cc: uri@w3.org

[I wrote all this before I saw the additional exchange between Dave
and Roy.]

There is one specific reason why we may need to remove this limit:
Internationalized Domain Names (IDNs). With IDNs, the resulting
punycode that is sent to a DNS server of course cannot be longer
than 255 octets/US-ASCII characters. However, because of the
compression properties of punycode, it is easy to e.g. construct
a domain name from a script that uses three octets per character
in UTF-8, but is relatively small so that punycode may compress
it to one or two US-ASCII characters per input character.
There are a lot of scripts like these, starting with the series
of Indic Scripts (Devanagari,...), Sinhala, Thai, Lao, Tibetan,
Myanmar, Georgian, Ethiopic, Cherkoee, Khmer, Mongolian, also
Japanese Katakana or Hiragana-only domain names.

Here is an example (just one label, some silly text saying
all in hiragana (choosing that simply because both me and my mailer can
handle it :-):
This label contains 39 hiragana characters. Converted to UTF-8 and
percent-escaped, this gives
The label contains 39*3 = 117 pct-escaped constructs (note that the
ABNF indicates the number of pct-escaped constructs, not the number
of actual characters, which in this case is 39*9 = 351 characters.

Converted to punycode, this reads:
The label is 62 characters long. This means that even including the
xn-- prefix, less than 2 US-ASCII characters are used per hiragana
character in the input. Punycode at work!

Using some such labels, we can easily construct a case where the
reg-name is more than 255 'pct-escaped' long, but still refers to
a totally legal IDN.

At 09:00 04/07/16 -0700, Larry Masinter wrote:

>Those who want to increase or remove the limit need
>to demonstrate that the widely deployed URI software
>does not assume the limit in order to function

My guess is that browsers that implement IDN and pct-escaped would
check the limit after the conversion to punycode, not before. But
this is currently only a guess. I could try to get something set up
for testing. But it may take time, because we just have started a
long weekend in Japan.

Regards,    Martin.
Received on Friday, 16 July 2004 22:08:29 UTC

