Re: Globalizing URIs

Keith Moore (moore@cs.utk.edu)
Fri, 11 Aug 1995 16:08:56 -0400


Message-Id: <199508112009.QAA13205@wilma.cs.utk.edu>
From: Keith Moore <moore@cs.utk.edu>
To: Martin J Duerst <mduerst@ifi.unizh.ch>
Cc: sollins@lcs.mit.edu (Karen R. Sollins), moore@cs.utk.edu,
Subject: Re: Globalizing URIs 
In-Reply-To: Your message of "Fri, 11 Aug 1995 21:17:34 +0200."
             <199508111917.PAA26455@CS.UTK.EDU> 
Date: Fri, 11 Aug 1995 16:08:56 -0400

> My impression is that some of the members of this group still have
> that discussion too deep in their bones so that they are unable to
> recognize that the way the implementation and the mapping
> of specific schemes was done (allowing full English text) has
> greatly jeopardized their original intentions.

No, that's quite easy to recognize.  But the current use of URLs
weren't specifically designed to be English-centric; it's a direct
consequence of the implementations of FTP, Gopher, and HTTP, and other
protocols on which URLs were based.

> URLs have a user-friendly character set; the problem is only that
> this user-friendliness is limited to English-speaking people.
> People use this facility to encode as much semantics as possible.

Anybody, regardless of language, is going to name their files using
their own language.  The problem isn't that people use filenames to
encode semantic information; the problem is that these filenames get
exported to the rest of the world (and that this favors some people
more than others).

I'm not going to stop using English words in my filenames.  But I am
currently building tools to do publishing, cataloging, location,
replication, etc., that don't use the original filename in the
published URL.  One of the reasons for building such tools is to fix
one of the causes of the "stale URL" problem -- the use of filenames
as external document identifiers is a big part of what causes URL
lookup failure.  If we want to solve this problem (and I think we do),
then we're going to stop using filenames anyway.

But rather than moving away from the filenames that cause us these
problems, you're trying to figure out how to add more baggage so we
can keep using them.  Not only does this not solve the "stale URL"
problem, it drastically increases the probability of transcription
errors.  I have a hard time seeing this as a step in the right
direction.

(Actually, I'm afraid that Karen is right -- we may well have to punt
transcribability in the long run.)

> For those who were not really aware
> of the issues of extended character sets for multilingual purposes,
> it was fully user-friendly from the beginning.

Some of us who oppose user-friendly URLs DO understand the issues,
because we've seriously looked at ways to solve this problem, and the
potential disaster that poor solutions might cause.  That's why we
oppose them, or are at the least very skeptical.

> If you had stayed with that, okay (with the footnote that in Hebrew and
> Arabic, consonants only are written in general text :-). As I have not
> taken part in the discussion, I can only guess, but my guess is that
> most of the people at that time indeed felt that this would be too
> clumsy, that they wouldn't like to transcribe their usual file names
> into something such as "l4c5r7g7mtn8thd". And now they are arguing
> against trying to address the same bad feeling and disliking that
> they cleverly managed out of the way for themselves, but that
> the greater part of the world is still faced with.

Look, this isn't a cultural or language bias issue.  It's just another
Internet scaling issue.  The more different things you try to hook
together, the more interoperability problems you have to solve.  We do
need to solve this problem, but the trick is to do so in a way that
doesn't make the overall situation worse.

> It may be unfair to many of you on this group to make such direct
> accusations, but for me there is too big a conflict between the
> official
> 	"we agreed that we were not going for user friendly names"
> and the actual, implicit:
> 	Let's care for us; we don't give a damn about the rest of 
> the world. 
> that people dealing with mulitlingal matters find in the present
> URL scheme.

That accusation could go in either direction.  As in:
"we don't give a damn about how well the web works in general,
so long as it lets us use our filenames".

It's probably true that we each try to solve the problems that hinder
us most.  Using ASCII filenames as document identifiers doesn't cause
me as many (immediate) problems as it does for you, so you have a
greater interest in fixing them (soon) than I do.  I want to fix them
too.  But maybe because the filename problem affects you more than it
does me, I'm more aware of other problems that also need fixing.

I still think we'd be better off if we tried to find a single solution
to both of the problems with URLs based on filenames, rather than
choosing a solution for your favorite (and worthwhile) problem that
makes the overall situation worse.

Keith