- From: Edward Cherlin <cherlin@newbie.net>
- Date: Thu, 3 Apr 1997 21:35:52 -0800
- To: uri@bunyip.com
Martin Duerst wrote: >On Wed, 2 Apr 1997, Larry Masinter wrote: > >> In my personal judgement, there was significant controversy >> about adding to a Draft Standard document additional constraints >> that were not part of the Proposed Standard and are not >> implemented in at least two interoperable implementations. What constraints? The proposal is to use %HH encoding of UTF-8 encoding of Unicode in ASCII URLs as the standard way to handle those non-ASCII character sets that have bidirectional mappings to Unicode, so that the characters can be positively identified. But as written it would have no effect on anyone using, say, ISO-2022. It does not forbid use of national or other partial character sets standards, but encourages the use of an unambiguous notation. >In the current discussion, started by my original proposals in >mid February, there was definitely no "significant controversy" >about procedural matters such as those you mention above. >If you think otherwise, please give the references to the >mailing archive. As you can see below, there is no need for >such a controversy. If you had brought this subject up earlier, >I could have answered as below earlier. > >Also, there are in no way any additional constraints. >There are only recommendations. I have clearly shown >that these don't affect existing (or even future) >implementations in any major way. If you want to challenge >this, please give the details. > [snip] >> > URL creation mechanisms that generate the URL from a source which >> > is not restricted to a single character->octet encoding are >> > encouraged, but not required, to transition resource names toward >> > using UTF-8 exclusively. >> > URL creation mechanisms that generate the URL from a source which >> > is restricted to a single character->octet encoding should use UTF-8 >> > exclusively. If the source encoding is not UTF-8, then a mapping >> > between the source encoding and UTF-8 should be used. >> > >> This is an additional requirement that does not correspond, >> as far as I can tell, to any kind of "implementation experience". >> I know of no URL creation mechanisms that actually do this. > >See above. "implementation experience" is obviously trivial. What sort of implementation? Does it have to be part of some Internet application, or can I just write a little utility? Anyway, it's just code page mapping and table lookup, and the tables are all provided along with The Unicode Standard, Version 2.0 volume from Addison-Wesley, and are available at the Unicode site. What's the big deal? >> Further, I think that the complaints that there is a certain >> amount of ambiguity in practice over exactly how one goes >> about doing this are legitimate, and that not only is there >> no "running code", there is not "rough consensus". > >The code that we have is obviously very much sufficient. >Rough consensus is there, the word "rough", as I have seen >it interpreted in IETF working groups, takes care of the >case of a single individual raising the same far-fetched >and unrelated complaints over and over, in a rather short >and cryptic manner, even after they have been addressed >in detail. Surely we don't accept the old Polish veto (yelling "NO" from the sidelines without actually participating in the discussion)? Has anyone raised a real technical objection, not just a "You're wrong" or "I don't like it"? >I don't know exactly what you intend to refer to with >"certain ambiguity". If you mean ambiguities arising from >URLs such as http://0oO0Il1.com/IlIl10oO.html, this is >obviously a problem that is ignored for ASCII, because >of the correct assumption that URL generators learn to >avoid such cases by trial and error if not otherwise. >I do not think that at the present time, things beyond >ASCII need to be specified more explicitly than ASCII >itself, in this respect. Can we try that again? We know what UTF-8 is, without any ambiguity. We know how to put Unicode text into canonical form, without ambiguity. We know how to do %HH encoding, without ambiguity. Where is the ambiguity? Are we talking about text such as Latin Capital Letter A U+0041 Greek Capital Letter Alpha U+0391 Cyrillic Capital Letter A U+0410 Which is visually indistinguishable from 'AAA'? This may be a bit of a problem for systems that try to display the characters properly, but only if they don't provide access to the %HH-encoded original UR* or to the numeric values of the characters, and don't provide any indication of the script of the characters when selected. This problem is outside the scope of this group, which is concerned with correct rendering, transmission, and interpretation of UR*s in software and in print, not their presentation to the user by software. >I very well acknowledge that for some cases, some more >detailled specifications are highly desirable. I have >talked with many people about the issues involved, and >I have repeatedly volunteered to work on the necessary >documemnts. However, I do not see any sense in writing >such documents in the void, without a clear commitment >for a good solution in the central document. Actually, >I would like nothing more than finishing the current >controversy on the base issue and having some time to >work on more documentation. I therefore sincerely hope >that we can stop useless "procedural concerns" as above >as quickly as possible. [Also, as long as we are only >concerned with %HH (this is the only thing that should go >into the current draft, I agree that the transition to >using "native" URLs is something more experimental, and >that the necessary documents for it will have to be written), >the potential ambiguities actually don't arise :-]. Yes, that all comes later. However, it will turn out to be much easier when we finally get there. This all reminds me of the flailing about in Europe trying to establish a European currency. If they just did it without trying at the same time to keep all of their old currencies, it would all turn out to be much easier, just as in the U.S. 200 years ago when the Federal currency replaced the state currencies. I realize that we cannot do this in the case of Unicode URLs, and I am not suggesting that we try. We will have to provide backward compatibility as far as possible. I merely claim that pure Unicode URLs will turn out to be far simpler than the current set of kluges. As far as I am concerned, Martin's proposal is a model of simplicity and clarity, meets a real need, and need not bother anyone who isn't interested. > >> > I'm surprised, too. I thought we had this worked out, and that >> > there was no significant objection or controversy. >> >> I hope that the domain name from which you post ("newbie.net") >> isn't some kind of joke. If you insist, I will forward you >> the three hundred or so email messages discussing the controversy >> around the proposed additions. Ad hominem, is it? Quite unworthy of you. After all, you could have looked. Very well, I will tell you--undoubtedly more than you want to know, but you brought it up. NewbieNet <http://www.newbie.net> is a three-year-old information service for new Internet users offering the NewbieNewz mailing list and several Web-based courses and information resources (New Newbie Pages, CyberCourse, Netiquette course, Unofficial Smiley FAQ, Frames Tutorial, and more). I have been using and writing about computers for 20 years, starting with timesharing (Amdahl 470) and the Commodore 64, Radio Shack TRS-80, CP/M, and Apple II, in APL, FORTH, BASIC and several assemblers. My involvement with multilingual software issues goes back to my experience with mixed Korean/English typesetting in the Peace Corps in 1967, and includes typesetting an APL magazine in every technology from manual pasteup of daisywheel printouts to Aldus Pagemaker and PostScript APL fonts. I now review multilingual software, with emphasis on Unicode support, for Multi-Lingual Computing magazine. In 1986 I organized and led a project to create a fully portable ISO/ANSI standard APL interpreter called I-APL which could run on any 8-bit or better computer and did run on the Apple II, Commodore 64, BBC Micro, and others, plus PC, Mac, and UNIX. It could be made to display in most writing systems. We put out versions of I-APL in English, French, German, Finnish, Russian, and Japanese (no Kanji support, of course). I participated in the ISO and ANSI APL standards development process, and the associated effort to get APL characters included in ISO 10646 and Unicode. I also have experience in typesetting math and music. I have written market research reports on Non-Latin fonts (1991) and the impact of Unicode (1994). Unicode is becoming a standard feature of many present and most promised operating systems, programming languages, Web browsers, E-mail software, industrial-strength database, and office suites from Microsoft, Lotus, and Corel. In particular, the character type in Java is defined to be a 16-bit Unicode character. Many people, like myself, depend heavily on Unicode for Web browsing and for publication. I read French, German, Hebrew, Yiddish, Russian, Chinese, Korean, and Japanese in varying degrees, and would like to be able to view all of the others correctly. I would use Unicode mail for preference if my correspondents could all receive it correctly, but that will take a few years. At present we are preparing to use Unicode mail on the Unicode mailing list as an experiment. I am particularly keen on getting Unicode accepted as the standard character set for everything to do with the Internet. It won't be a true World-Wide Web until everyone can publish in their own languages and writing systems so that everyone else can see it properly. Until Unicode is the standard character set, there will be no standard for creating and viewing multilingual documents, or even single language, single script documents in anything other than ASCII. >I guess there is no need to do that. Edward is very well aware >of the discussion that went on. Some of the best contributions >to it are from him. He probably followed the discussion more >closely than many others. Threatening him with mail flooding >is beyond what I want to comment about. > > >Regards, Martin. Yeah, me too. -- Edward Cherlin cherlin@newbie.net Everything should be made Vice President Ask. Someone knows. as simple as possible, NewbieNet, Inc. __but no simpler__. http://www.newbie.net/ Attributed to Albert Einstein
Received on Friday, 4 April 1997 11:45:43 UTC