- From: Edward Cherlin <cherlin@newbie.net>
- Date: Mon, 14 Apr 1997 10:40:43 -0700
- To: uri@bunyip.com
"Roy T. Fielding" <fielding@kiwi.ICS.UCI.EDU> wrote: [snip] [>François Yergeau a écrit:] >>I also happen to disagree with this particular opinion. ASCII characters >>are not the only ones worth displaying. User-friendliness should not be >>the exclusive apanage of ASCII users. > >As it states quite clearly in the draft, > > These design concerns are not always in alignment. For example, it > is often the case that the most meaningful name for a URL component > would require characters which cannot be typed on most keyboards. This is incorrect. Any script can be typed on any keyboard, and displayed on any graphics screen. What is lacking may be the software, including keyboard mapping tables, input methods, fonts, and rendering software. However, all of these elements exist, even if they are not yet widely deployed. Consider the rather extensive Unicode support hidden (sic) in Microsoft Office 97. Since this is now available for more than 90% of the computers in the world, we can no longer plead lack of ability. If necessary (and it often has been) it is possible to type Greek on a pure ASCII keyboard into a pure ASCII OS and application, with pure ASCII display, so that the resulting file will display correctly on a computer that does support Greek display. People in the business have learned to read Greek fluently in this ASCII form. This point is important, and should be stated correctly, or rather removed from the draft entirely, unless there is some other valid example of msalignment. > The ability to transcribe the resource location from one medium to > another was considered more important than having its URL consist > of the most meaningful of components. In local and regional > contexts and with improving technology, users might benefit from > being able to use a wider range of characters. The phrase "local and regional" here is inappropriate. (Actually it is infuriating, but we won't go into that.) More to the point, now we can achieve both aims. We can provide completely meaningless (to users of ASCII) URLs which will display correctly in local, regional, scholarly, international business, political, etc. contexts, instead of ASCII URLs which are meaningless to their intended users. > However, such use > is not guaranteed to work, and should therefore be avoided. "Guaranteed"?!? ROTFLOL. This is the Internet. Large portions of numerous standards fail routinely, most especially those not yet widely or correctly implemented. (nailing jelly to a tree? herding cats? bottling fog?) Where was this writer when frames came out in HTML? Pages using frames still have to check the user's browser version on every access. We cannot always have total backward compatibility. However, the current proposal for %HH-encoded UTF-8 actually offers far more backward compatibility than is usual, while reducing cases that fail in worse ways. The proposal to use %HH-encoded UTF-8 will work correctly with all browsers tested so far (3 of them). In fact, I am at a loss to understand how it could fail to work. The possibilities I can think of are: - The browser will display the URL correctly in ASCII - The browser will display the URL correctly in the intended script(s) - The browser will display the URL not quite correctly because of missing fonts - The browser will have a display bug that will be promptly fixed but in all of the above cases, the URL will still work to fetch the page, and can be cut, copied, and pasted, and can in fact be printed and typed back in if it has to be. As usual, the server can process the URL to locate the page in any way the site designer pleases, so that both the %HH-encoded URL and a UTF-8 or UTF-16 URL would fetch the same page. Or am I missing something? The technology for correct creation, interpretation, and display of UTF-8 URLs exists and has been demonstrated in a small way. We may need a somewhat larger demonstration, but we cannot pretend that it is too soon to do that. >Your comments have done nothing to change the conclusions already >represented within the draft. > >>>IF you can persuade the creators of URLs to always use UTF-8, which >>>is definitely not the case today (Apache, NCSA, and CERN servers all >>>use whatever charset is used by the underlying filesystem, which on >>>most Unix-based systems is iso-8859-1 or iso-2022-*), ... You're going much too far here. The proposed language recommends, but does not require, UTF-8, and aims at an eventual transition to UTF-8 only, presumably when the software supports it. In any case, don't servers running on NT support Unicode encodings including UTF-8, since the native file system does so? And what will happen when they run on a UNIX that does have a Unicode file name system? Eventually they all will, you know. [snip] >.....Roy -- Edward Cherlin cherlin@newbie.net Everything should be made Vice President Ask. Someone knows. as simple as possible, NewbieNet, Inc. __but no simpler__. http://www.newbie.net/ Attributed to Albert Einstein
Received on Monday, 14 April 1997 14:24:21 UTC