Re: Globalizing URIs

Paul Hoffman (ietf-lists@proper.com)
Fri, 11 Aug 1995 18:33:20 -0700


Message-Id: <v0213051fac51b034bc9e@[165.227.40.34]>
Date: Fri, 11 Aug 1995 18:33:20 -0700
To: uri@bunyip.com
From: ietf-lists@proper.com (Paul Hoffman)
Subject: Re: Globalizing URIs

Comments on many people's responses:

- Karen Sollins brings up many good points about the history of RFC 1737.
Basically, using meaning in a name greatly increases the chance that that
name will become invalid or changed in the future. It is a general desire
for URL names to be as persistent as possible.

Having said that, almost no one is paying attention. Look at the URLs on a
couple of randomly-selected pages from Yahoo or any of the WWW Virtual
Libraries. At least 80% of them have plenty of meaning. Even the
non-English ones look like they have meaning; even though I don't speak
Italian, most of the ones in the .it domain seem to have a mixture of
consonants and vowels that look like Italian language to me.

- Keith Moore brings up what I consider to be a very, very strong argument
against client software showing meaning in a URL that is different than the
characters in the URL, which is what we are really discussing here.
Basically, if you show one thing that really means another, the user will
very likley try to transcribe the wrong one, particularly if there is no
way for the user to know that a transformation has taken place. Even with
smart copying (select a converted URL, copy it, and the copy is the
unconverted one), a user looking at the screen will write down an incorrect
transcription.

- Martin Duerst did us a big favor by laying out the proposed encodings:

A1)     <[ISO-8859-1]http://xxx.yyy.zz/AA/BB/CC.html>
A2)     <http:[ISO-8859-1]//xxx.yyy.zz/AA/BB/CC.html>
A3)     <http://xxx.yyy.zz/[ISO-8859-1]AA/BB/CC.html>
A4)     <http://xxx.yyy.zz/AA/BB/CC.html[ISO-8859-1]>
A5)     <http://xxx.yyy.zz/AA/BB/CC.html;ISO-8859-1>

I note that A1 and A1 would break every client in use today. Further, I
*hope* no one is expecting their domain names to have meaning taken from
them and shown to a client in a different form than they appear.

A3 would be the easiest for Web server administrators to implement with
today's Web server software (I can't speak for FTP or Gopher servers). A4
would break the (admittedly dumb, but common) Web browsers that look at the
end of the URL to see what "kind" of file it is getting.

In summary, I feel that Keith Moore has brought up the most salient point:
if we show the user an alternate view of a URL, that will lead to endless
confusion when they decide to do anything other than select that URL in the
single browser they are running at the moment.

Yes, we are all (well, almost all) putting meaning into our URLs today
using the very non-international character set given to us in RFC 1738.
Yes, this is dumb because even if you understand my character set, you may
not understand my langauge, and thus will not get any meaning from the
language-specific part. Giving you better access to my desired character
set will only help you if you also understand my language, and it will also
introduce a large number of other problems.