Re: Globalizing URIs

Martin J Duerst (mduerst@ifi.unizh.ch)
Fri, 11 Aug 1995 21:17:34 +0200 (MET DST)


Message-Id: <9508111918.AA16539@mocha.bunyip.com>
Subject: Re: Globalizing URIs
To: sollins@lcs.mit.edu (Karen R. Sollins)
Date: Fri, 11 Aug 1995 21:17:34 +0200 (MET DST)
Cc: moore@cs.utk.edu, mduerst@ifi.unizh.ch, FisherM@is3.indy.tce.com,
In-Reply-To: <199508111417.KAA05540@lysithea.lcs.mit.edu> from "Karen R. Sollins" at Aug 11, 95 10:17:04 am
From: Martin J Duerst <mduerst@ifi.unizh.ch>

Karen,

>  What's more, "this group" (not in the formal sense
>any more) discussed these issues at length over many months several
>years ago.  At that time we agreed that we were not going for user
>friendly names.  Use by humans was to be discouraged.  If they are to
>be globally unique and long-lived and free of semantics, so that the
>semantics will not be invalidated with time then they are not going
>to be things that people would/should use, and such issues as user
>friendly character sets should not be an issue.

My impression is that some of the members of this group still have
that discussion too deep in their bones so that they are unable to
recognize that the way the implementation and the mapping
of specific schemes was done (allowing full English text) has
greatly jeopardized their original intentions.

URLs have a user-friendly character set; the problem is only that
this user-friendliness is limited to English-speaking people.
People use this facility to encode as much semantics as possible.
You might have agreed that you didn't want to go for user-friendly
names, but finally, you did! For those who were not really aware
of the issues of extended character sets for multilingual purposes,
it was fully user-friendly from the beginning.


>Personally, I would like to discourage human transcription as well,
>but the group did agree that that was an important feature, so we
>should pick a character set that is limited enough that it is
>transcribable on any keyboard we know of or can imagine.  As was said
>earlier, if that means digits only, that's fine.  For a long while
>we've been using the digits plus about 20 consonants.  No vowels, to
>discourage any use of "words" in any language that we knew of.

If you had stayed with that, okay (with the footnote that in Hebrew and
Arabic, consonants only are written in general text :-). As I have not
taken part in the discussion, I can only guess, but my guess is that
most of the people at that time indeed felt that this would be too
clumsy, that they wouldn't like to transcribe their usual file names
into something such as "l4c5r7g7mtn8thd". And now they are arguing
against trying to address the same bad feeling and disliking that
they cleverly managed out of the way for themselves, but that
the greater part of the world is still faced with.

It may be unfair to many of you on this group to make such direct
accusations, but for me there is too big a conflict between the
official
	"we agreed that we were not going for user friendly names"
and the actual, implicit:
	Let's care for us; we don't give a damn about the rest of the world.
that people dealing with mulitlingal matters find in the present
URL scheme.


>I believe that it was our intention, as described in RFC 1737, that
>URNs were not required and probably not expected to have exposed
>meaning in either sense.  That would certainly not prevent the creator
>of a URN from embedding semantics.  In fact, URN creators might choose
>to expose the semantics they embed, but they should know that the are
>exposing their users and perhaps themselves to the sorts of problems
>that Keith has been describing.

The greater part of the users freely and quite naturally choose to
expose semantics, to the greatest extent possible, even if they were
very well aware of the problems that might result. It happens quite
automatically, even with telephone numbers. This is human nature,
and something that can hardly be fought with some nice declarations
of intention in some RFC, esp. of course if the implementation is not
really up to the intentions.


>I don't recommend that we repeat all the discussions about semantics
>and therefore character set again.  We should get through at least
>complete one round of engineering of the full complement of components
>needed to do identification of and access to objects in the net.

We don't need to repeat this discussions. We know our intentions, and we
think we know how to do it better the next time. But we also have to do
something to remedy the problems of the first generation.


Much less than discussing about more or less obsolete intentions,
I would prefer to talk about attempts at solutions. I have made a number
of proposals, some of which touch the URL group only marginally
(so that they can say "our intentions stay clean, it's not some
of our business") and I would like to hear from the group oppinions
that start:
	"Well, if you think that you have to do anything about it,
		better do it like this than like that."
If such oppions are accompained by the diclaimer
	"This is against the intentions of URLs as discussed long ago
	in this group, and I therefore don't recommend anything similar."
that's fine for me. By now, I know these disclaimers anyway, but I
would like to see a posting that consists about something more than
disclaimers :-).

Regards,	Martin.