Re: revised "generic syntax" internet draft

Edward Cherlin (
Mon, 14 Apr 1997 10:40:43 -0700

Message-Id: <v0300781baf78109a22c7@[]>
In-Reply-To: <>
Date: Mon, 14 Apr 1997 10:40:43 -0700
From: Edward Cherlin <>
Subject: Re: revised "generic syntax" internet draft

"Roy T. Fielding" <fielding@kiwi.ICS.UCI.EDU> wrote:

[>François Yergeau a écrit:]
>>I also happen to disagree with this particular opinion.  ASCII characters
>>are not the only ones worth displaying.  User-friendliness should not be
>>the exclusive apanage of ASCII users.
>As it states quite clearly in the draft,
>   These design concerns are not always in alignment.  For example, it
>   is often the case that the most meaningful name for a URL component
>   would require characters which cannot be typed on most keyboards.

This is incorrect. Any script can be typed on any keyboard, and displayed
on any graphics screen. What is lacking may be the software, including
keyboard mapping tables, input methods, fonts, and rendering software.
However, all of these elements exist, even if they are not yet widely
deployed. Consider the rather extensive Unicode support hidden (sic) in
Microsoft Office 97. Since this is now available for more than 90% of the
computers in the world, we can no longer plead lack of ability.

If necessary (and it often has been) it is possible to type Greek on a pure
ASCII keyboard into a pure ASCII OS and application, with pure ASCII
display, so that the resulting file will display correctly on a computer
that does support Greek display. People in the business have learned to
read Greek fluently in this ASCII form.

This point is important, and should be stated correctly, or rather removed
from the draft entirely, unless there is some other valid example of

>   The ability to transcribe the resource location from one medium to
>   another was considered more important than having its URL consist
>   of the most meaningful of components.  In local and regional
>   contexts and with improving technology, users might benefit from
>   being able to use a wider range of characters.

The phrase "local and regional" here is inappropriate. (Actually it is
infuriating, but we won't go into that.)

More to the point, now we can achieve both aims. We can provide completely
meaningless (to users of ASCII) URLs which will display correctly in local,
regional, scholarly, international business, political, etc. contexts,
instead of ASCII URLs which are meaningless to their intended users.

>   However, such use
>   is not guaranteed to work, and should therefore be avoided.

"Guaranteed"?!? ROTFLOL. This is the Internet. Large portions of numerous
standards fail routinely, most especially those not yet widely or correctly
implemented. (nailing jelly to a tree? herding cats? bottling fog?) Where
was this writer when frames came out in HTML? Pages using frames still have
to check the user's browser version on every access.

We cannot always have total backward compatibility. However, the current
proposal for %HH-encoded UTF-8 actually offers far more backward
compatibility than is usual, while reducing cases that fail in worse ways.

The proposal to use %HH-encoded UTF-8 will work correctly with all browsers
tested so far (3 of them). In fact, I am at a loss to understand how it
could fail to work. The possibilities I can think of are:

- The browser will display the URL correctly in ASCII
- The browser will display the URL correctly in the intended script(s)
- The browser will display the URL not quite correctly because of missing fonts
- The browser will have a display bug that will be promptly fixed

but in all of the above cases, the URL will still work to fetch the page,
and can be cut, copied, and pasted, and can in fact be printed and typed
back in if it has to be. As usual, the server can process the URL to locate
the page in any way the site designer pleases, so that both the %HH-encoded
URL and a UTF-8 or UTF-16 URL would fetch the same page.

Or am I missing something?

The technology for correct creation, interpretation, and display of UTF-8
URLs exists and has been demonstrated in a small way. We may need a
somewhat larger demonstration, but we cannot pretend that it is too soon to
do that.

>Your comments have done nothing to change the conclusions already
>represented within the draft.
>>>IF you can persuade the creators of URLs to always use UTF-8, which
>>>is definitely not the case today (Apache, NCSA, and CERN servers all
>>>use whatever charset is used by the underlying filesystem, which on
>>>most Unix-based systems is iso-8859-1 or iso-2022-*), ...

You're going much too far here. The proposed language recommends, but does
not require, UTF-8, and aims at an eventual transition to UTF-8 only,
presumably when the software supports it. In any case, don't servers
running on NT support Unicode encodings including UTF-8, since the native
file system does so? And what will happen when they run on a UNIX that does
have a Unicode file name system? Eventually they all will, you know.


Edward Cherlin     Everything should be made
Vice President     Ask. Someone knows.       as simple as possible,
NewbieNet, Inc.                                 __but no simpler__.                Attributed to Albert Einstein