Re: [URN] URI internationalization

Francois Yergeau (
Fri, 15 Nov 1996 16:11:48 -0500

Message-Id: <>
Date: Fri, 15 Nov 1996 16:11:48 -0500
To: Ron Daniel <>
From: Francois Yergeau <>
Subject: Re: [URN] URI internationalization

[Cross-posted to URI list, from URN-IETF list]

=C0 09:05 15-11-96 -0700, Ron Daniel a =E9crit :
>I think I18N for URLs is a more difficult problem than it has been for
>URNs. We have a large number of existing URLs in a variety of character

Well, no, it appears we don't really have that.  I made a search for
non-ASCII URLs last spring (both 8-bit octets and %XY with X>=3D8), and f=
very few out on the Web (cf.
<>).  Less than
0.25% in fact, and then some were typos (divide signs instead of tilde, f=
instance) that didn't work until corrected by hand.

Furthermore, compatibility is made easier by the fact that UTF-8 data can=
quite reliably recognized as such.  Given a UR*, a server can test it for
UTF-8 validity; if it fails, it's some 'old' UR* in some encoding other t=
UTF-8, the server can process as it did before and nothing is broken; if =
passes, just process as UTF-8.  A little experimentation (need more) show=
that false positives are unlikely, provided one takes care of 7-bit
ISO-2022-like encodings that look like ASCII (and thus UTF-8) but are not.
As for complexity, a UTF-8 validator fits in about 20 lines of C.

>While I18N for URLs is a legitimate issue, it is not an issue for the
>URN-WG (IMHO). The URI list is still alive, that might be the proper
>place to begin discussions.

Agreed, I cross-posted there.  Please limit replies to the URI list.


Fran=E7ois Yergeau <>
Alis Technologies Inc., Montr=E9al
T=E9l : +1 (514) 747-2547
Fax : +1 (514) 747-2561