Re: Globalizing URIs

Keith Moore (
Thu, 10 Aug 1995 17:57:15 -0400

Message-Id: <>
From: Keith Moore <>
To: Martin J Duerst <>
Cc: (Keith Moore),,
Subject: Re: Globalizing URIs 
In-Reply-To: Your message of "Thu, 10 Aug 1995 20:33:05 +0200."
Date: Thu, 10 Aug 1995 17:57:15 -0400

> >The only thing that you gain from multilingual URLs is that they look
> >nice on a screen or on paper or on a business card.  And this comes at
> >a tremendous cost, because when someone tries to type in what they
> >think they see on a screen or business card, it will frequently
> >translate into some other sequence of octets that gets presented to
> >the ftp or http server.
> Therefore we have to assure that the mapping between "nice" form
> and "plain" form is clear, with the necessary mechanisms.

I wish you luck.  The problem is that there isn't one "nice" form,
there are lots of them, and you don't have control over how these
things get passed around.

Example: The author takes a filename on a file server (in the server's
local charset) and translate it to a multilingual URL.  The reader's
web browser displays that URL so that it looks nice.  The user copies
that URL with a mouse into another window, maybe into a word processor
that uses a different charset than the browser.  The document gets
printed out, or maybe emailed through a gateway that translates from
the local charset into one that's more likely to be usable by a
typical MIME mail reader.  Someone else gets that document and types
in the URL, whereupon it gets transmitted to the file server, and the
file server tries to translate the URL back to a filename.

If all of this works, it will be a miracle.

The only way I can see that this would work would be to *always* keep
the "backward compatibility" pure-ASCII form attached to the "pretty"
one.  This would mean, for instance, that when you "copy" a URL from
your web browser to another application, it would include the
pure-ASCII form -- even if the user only saw the pretty one in his URL
window.  Presumably, users would learn to include both the pretty URL
and the ASCII one on paper documents and business cards -- (much as
Japanese business cards I've seen that include formally written,
phonetic, and romanized versions of names and titles on the same

And you will probably need to make sure that the "charset tag" is
always part of the URL is *visible* -- even when displaying it in that
charset.  (otherwise, when the URL is copied with pencil and paper and
then back to a keyboard, the app will not be able to tell which
charset it was in and may interpret it differently.)

> >This can happen for a wide variety of
> >reasons: there are dozens of different charsets in use, charset
> >translation tables aren't invertable, there are often several
> >different sequences of octets to "spell" a particular character, and
> >people don't know how to type those wierd (to them) characters anyway.
> Well, to those that read and type them, these characters are very natural,
> and the ASCII characters, natural for us, may feel strange. As for different
> representations, in Japan, there are more representations for an 'a' than
> for the average Japanese Kanji!

I don't doubt that.  But my guess is that any Japanese user of
Internet email already knows how to generate the octet value for 'm'
in '' in such a way that your mail reader will
accept it.

> >> I understand what you are saying, but what the world at large currently
> >> sees in terms of URLs is different. Every company is trying to get
> >> a nice domain name; there are even companies who do their
> >> business by organising such names for others. And every Webmaster
> >> is trying to make the URLs, esp. for entry points, easily recognizable
> >> and memorizable. Anything else is very bad marketing indeed.
> >
> >Yep, it's indeed a problem.  Until there are better tools, people are
> >going to try to make URLs that are meaningful.  
> It's not only a tool problem. Newspapers will exist for quite some
> more time.

My point was that we really need better standardized ways to find
documents than by typing in URLs, and better ways to learn about the
characteristics of a document than by examining its URL.  Until we
have them, we're going to keep trying to put features into URLs that
don't belong there, like the title of the document, and content-type
information, and whether it's suitable for children.

> >But domains aren't going to become non-ASCII, and neither will URLs --
> >for the same reason.  People who try to do this with their own URLs
> >will only succeed in making it harder for other folks to access their
> >sites.  People who build multilingual URL support into their net
> >browsers will only end up making them harder to use.
> With the present state of affairs, yes. But not if we find good
> solutions.

Again, I wish you luck.  I pray that you find a good solution.
But please remember that a poor solution to this problem could
well be worse than not solving it at all.

> >It's really no different than people insisting on meaningful telex
> >addresses or meaningful phone numbers.  Any worldwide address needs to
> >be in a universal, widely available, character set.
> It IS different. Japanese are at least as good as Americans to
> create puns and remembering aids for numbers. But there is a
> clear imbalance if English-language people and companies
> can use their names straight, whereas others have to use them
> in a mutilated form. For domain names and email addresses,
> there has to be a number only (or ASCII only) form, but
> for document names and such, there is no such need.

I see your point, but I think that the solution (as in the case of
telephone numbers) is not to change the address, but to build a
directory.  Perhaps those that don't ordinarily use latin characters
have more incentive to build it, but the rest of us will find it
useful nonetheless.

> >Right.  My point is that things are just going to go more in this
> >direction.  Even though it's ugly, it's the best solution (and also
> >the path of least resistance).
> The tools you mention need something to start with.

The trick is to get things going in the right direction, so that we
don't paint ourselves into another corner.