- From: Keith Moore <moore@cs.utk.edu>
- Date: Tue, 08 Aug 1995 14:23:24 -0400
- To: Martin J Duerst <mduerst@ifi.unizh.ch>
- Cc: FisherM@is3.indy.tce.com (Fisher Mark), uri@bunyip.com, moore@cs.utk.edu
> Although an URL is technically spoken just a sequence of octets, > encoded with %HH if necessary, these octets in more cases than > not represent characters, and there are many occasions on which > it would be desirable to show the actual characters to a user, > which, in an international setting, is only possible if the > character set and encoding of these characters are known. I understand why people think it's a good idea, but I think it's not possible in general to solve this problem. There is a fundamental conflict between the desire to be able to input URIs from a keyboard, and the desire to be able to make URIs be "meaningful" to humans. If you try to accomodate more character sets, you compromise the former. Even if you try to have two spellings of a URI (one in ASCII, the other human-meaningful to non-English speakers), the latter approach loses, because it's more important that people be able to transcribe URIs than that they be able to understand them. URIs are going to become *less* meaningful as time goes on anyway, because of other concerns (scalability, long-term stability, etc.) So trying to make them human-readable is a wash. This same argument surfaces from time to time in the email world. People want to use their real names as email addresses, and I don't blame them. But the fact is that most people can't properly type in a Japanese, Chinese, Korean, Hebrew, Russian, etc., name if they don't themselves read Japanese, Chinese, Korean, Hebrew, Russian, etc. In either case, what we're going to end up with is a non-obvious mapping between the (human-meaningful) "local" version of a name, and the (transcribable) one that is used when talking to the outside world. The best we can do is to build tools that help us manage this mapping. And there is a strong argument that (human-meaningful) names and (machine-meaningful) addresses should be kept separate anyway. Make the document titles human meaningful, let's build search services that understand various character sets, and let the search services resolve into pure-ASCII URIs. Keith
Received on Tuesday, 8 August 1995 14:24:02 UTC