Message-Id: <9508111918.AA16539@mocha.bunyip.com> Subject: Re: Globalizing URIs To: email@example.com (Karen R. Sollins) Date: Fri, 11 Aug 1995 21:17:34 +0200 (MET DST) Cc: firstname.lastname@example.org, email@example.com, FisherM@is3.indy.tce.com, In-Reply-To: <199508111417.KAA05540@lysithea.lcs.mit.edu> from "Karen R. Sollins" at Aug 11, 95 10:17:04 am From: Martin J Duerst <firstname.lastname@example.org> Karen, > What's more, "this group" (not in the formal sense >any more) discussed these issues at length over many months several >years ago. At that time we agreed that we were not going for user >friendly names. Use by humans was to be discouraged. If they are to >be globally unique and long-lived and free of semantics, so that the >semantics will not be invalidated with time then they are not going >to be things that people would/should use, and such issues as user >friendly character sets should not be an issue. My impression is that some of the members of this group still have that discussion too deep in their bones so that they are unable to recognize that the way the implementation and the mapping of specific schemes was done (allowing full English text) has greatly jeopardized their original intentions. URLs have a user-friendly character set; the problem is only that this user-friendliness is limited to English-speaking people. People use this facility to encode as much semantics as possible. You might have agreed that you didn't want to go for user-friendly names, but finally, you did! For those who were not really aware of the issues of extended character sets for multilingual purposes, it was fully user-friendly from the beginning. >Personally, I would like to discourage human transcription as well, >but the group did agree that that was an important feature, so we >should pick a character set that is limited enough that it is >transcribable on any keyboard we know of or can imagine. As was said >earlier, if that means digits only, that's fine. For a long while >we've been using the digits plus about 20 consonants. No vowels, to >discourage any use of "words" in any language that we knew of. If you had stayed with that, okay (with the footnote that in Hebrew and Arabic, consonants only are written in general text :-). As I have not taken part in the discussion, I can only guess, but my guess is that most of the people at that time indeed felt that this would be too clumsy, that they wouldn't like to transcribe their usual file names into something such as "l4c5r7g7mtn8thd". And now they are arguing against trying to address the same bad feeling and disliking that they cleverly managed out of the way for themselves, but that the greater part of the world is still faced with. It may be unfair to many of you on this group to make such direct accusations, but for me there is too big a conflict between the official "we agreed that we were not going for user friendly names" and the actual, implicit: Let's care for us; we don't give a damn about the rest of the world. that people dealing with mulitlingal matters find in the present URL scheme. >I believe that it was our intention, as described in RFC 1737, that >URNs were not required and probably not expected to have exposed >meaning in either sense. That would certainly not prevent the creator >of a URN from embedding semantics. In fact, URN creators might choose >to expose the semantics they embed, but they should know that the are >exposing their users and perhaps themselves to the sorts of problems >that Keith has been describing. The greater part of the users freely and quite naturally choose to expose semantics, to the greatest extent possible, even if they were very well aware of the problems that might result. It happens quite automatically, even with telephone numbers. This is human nature, and something that can hardly be fought with some nice declarations of intention in some RFC, esp. of course if the implementation is not really up to the intentions. >I don't recommend that we repeat all the discussions about semantics >and therefore character set again. We should get through at least >complete one round of engineering of the full complement of components >needed to do identification of and access to objects in the net. We don't need to repeat this discussions. We know our intentions, and we think we know how to do it better the next time. But we also have to do something to remedy the problems of the first generation. Much less than discussing about more or less obsolete intentions, I would prefer to talk about attempts at solutions. I have made a number of proposals, some of which touch the URL group only marginally (so that they can say "our intentions stay clean, it's not some of our business") and I would like to hear from the group oppinions that start: "Well, if you think that you have to do anything about it, better do it like this than like that." If such oppions are accompained by the diclaimer "This is against the intentions of URLs as discussed long ago in this group, and I therefore don't recommend anything similar." that's fine for me. By now, I know these disclaimers anyway, but I would like to see a posting that consists about something more than disclaimers :-). Regards, Martin.