Re: Globalizing URIs

Keith Moore (moore@cs.utk.edu)
Tue, 08 Aug 1995 14:23:24 -0400


Message-Id: <199508081823.OAA05722@thud.cs.utk.edu>
From: Keith Moore <moore@cs.utk.edu>
To: Martin J Duerst <mduerst@ifi.unizh.ch>
Cc: FisherM@is3.indy.tce.com (Fisher Mark), uri@bunyip.com, moore@cs.utk.edu
Subject: Re: Globalizing URIs 
In-Reply-To: Your message of "Tue, 08 Aug 1995 19:40:16 +0200."
             <9508081740.AA12230@mocha.bunyip.com> 
Date: Tue, 08 Aug 1995 14:23:24 -0400

> Although an URL is technically spoken just a sequence of octets,
> encoded with %HH if necessary, these octets in more cases than
> not represent characters, and there are many occasions on which
> it would be desirable to show the actual characters to a user,
> which, in an international setting, is only possible if the
> character set and encoding of these characters are known.

I understand why people think it's a good idea, but I think it's 
not possible in general to solve this problem.  There is a fundamental
conflict between the desire to be able to input URIs from a keyboard,
and the desire to be able to make URIs be "meaningful" to humans.
If you try to accomodate more character sets, you compromise the former.

Even if you try to have two spellings of a URI (one in ASCII, the other
human-meaningful to non-English speakers), the latter approach loses, because 
it's more important that people be able to transcribe URIs than that they
be able to understand them.  URIs are going to become *less* meaningful as
time goes on anyway, because of other concerns (scalability, long-term 
stability, etc.)  So trying to make them human-readable is a wash.

This same argument surfaces from time to time in the email world.
People want to use their real names as email addresses, and I don't
blame them. But the fact is that most people can't properly type in
a Japanese, Chinese, Korean, Hebrew, Russian, etc., name if they
don't themselves read Japanese, Chinese, Korean, Hebrew, Russian, etc.

In either case, what we're going to end up with is a non-obvious
mapping between the (human-meaningful) "local" version of a name, and
the (transcribable) one that is used when talking to the outside
world.  The best we can do is to build tools that help us manage this
mapping.

And there is a strong argument that (human-meaningful) names and
(machine-meaningful) addresses should be kept separate anyway.  Make
the document titles human meaningful, let's build search services that
understand various character sets, and let the search services resolve
into pure-ASCII URIs.

Keith