Re: revised "generic syntax" internet draft

John C Klensin (klensin@mci.net)
Tue, 15 Apr 1997 11:55:43 -0400 (EDT)


Date: Tue, 15 Apr 1997 11:55:43 -0400 (EDT)
From: John C Klensin <klensin@mci.net>
Subject: Re: revised "generic syntax" internet draft
In-Reply-To: <199704151350.PAA20358@valinor.malmo.trab.se>
To: Dan Oscarsson <Dan.Oscarsson@trab.se>
Cc: Harald.T.Alvestrand@uninett.no, uri@bunyip.com, fielding@kiwi.ICS.UCI.EDU
Message-Id: <SIMEON.9704151143.E@tp7.Jck.com>


On Tue, 15 Apr 1997 15:50:11 +0200 (MET DST) Dan Oscarsson 
<Dan.Oscarsson@trab.se> wrote:
>...
> Well, Swedish letters like едц are normally called Latin, but I assume you
> mean ascii.

I can't speak for Roy, but, in my earlier note on the 
subject, I meant *Latin*.  The reality is that UTF-8 is 
"user friendly" --and will get through a lot of systems 
without either advanced planning or difficulties-- if the 
character set that is actually in use is ISO 8859-1, not 
just ASCII.  It isn't too bad for the other Latin 
alphabets.  But for the character collections that are 
distinctly not Latin-based, the display resulting from the 
use of UTF-8 in the absence of the sort of aggressive, 
front-end, "everyone needs to apply it" translations that 
Roy suggested are not only not user-friendly, but closely 
approximate a secret code (worse than %-notation or the 
notorious Q-P).

If one looks ahead more than a year or so and assumes 
worldwide use of the Internet, there are more of "them" 
than there are of "us" and the marginal fraction of the 
population that considers 8859-1 (and hence UTF-8) to be 
user-friendly as compared to ASCII is, unfortunately, 
barely worth the trouble.

It would have been better had URLs been carefully and 
thoughtfully internationalized from the very beginning.  
For whatever reasons, they weren't.  A conversion now is 
going to be painful.  But, if the pain is worth it, and I 
suspect it might be, then let's look to a balanced, 
equitable, *international* solution, not using UTF-8 
encoding in the hope that no one who uses ideographic 
characters will be bothered about what happens to them.

> If we cannot find a way to send URLs containing any character in a way so
> that the characters can be understood and displyed in a user friendly
> manner, the web and URLs are not the future.

I completely agree with this.  However, I think we need to 
adopt a very broad understanding of "user friendly" as well 
as keeping in mind that, for intersystem protocol purposes, 
ASCII, -- or even the stable subset of ISO 646 / T.50 -- 
have a much more successful track record (in both the 
IETF and ISO/ITU arenas) than any of the many attempts at 
"national", "localized", "international", or "universal" 
character sets.

   john