- From: Martin J. Duerst <mduerst@ifi.unizh.ch>
- Date: Mon, 7 Apr 1997 16:45:58 +0200 (MET DST)
- To: Larry Masinter <masinter@parc.xerox.com>
- Cc: Edward Cherlin <cherlin@newbie.net>, uri@bunyip.com
On Fri, 4 Apr 1997, Larry Masinter wrote: > I'm surprised I have to spell this out. In an environment with many participants with different backgrounds and different oppinions, spelling things out is usually much better than assuming everybody has the same understanding as you. > Martin has proposed that we add to the "Draft Standard" the following > wording: > > > URL creation mechanisms that generate the URL from a source which > > is not restricted to a single character->octet encoding are > > encouraged, but not required, to transition resource names toward > > using UTF-8 exclusively. The wording is by Roy Fielding. I just had to reiterate the proposal to put it into the draft because it had been ignored. The reasons for why it has been ignored, as far as Larry's explanations go, seem to be mainly of procedural nature. Unfortunately, these reasons have not been discussed before, so that's why we have to discuss them now. > I will point out that of all of the implementors of all of the software > that I'm aware of that contain "URL creation mechanisms", including > the software products from Alis, Accent, Netscape and Microsoft -- > even in the latest versions which purport to support UTF-8 in the > representation of text-- I have yet to see any product that is a > "URL creation mechanism" that actually does what Martin is proposing > we encourage them to do ("transition resource names toward using > UTF-8 exclusively"). You seem to have a rather restricted view of "URL creation mechanism". And you seem to forget that the standard is not about URL creation mechanisms, it is about URLs as such. You asked for two interoperable implementations. Obviously, interoperability for URLs means that if I get an URL, for example via mail, and transfer it, for example to paper and then into a web browser, I get the resource that the URL denotes. You had ample possibility to check that this is indeed possible with the two URLs I provided. They use two different schemes/protocols, and work with a wide set of browsers, so as far as URLs with UTF-8 are concerned, the formal/procedural requirements are met. As far as "URL creation mechanisms" go, I also mentionned two of them, namely a) do it by hand, and b) use a system that already uses UTF-8 to denote file names (such as RS 6000 with the right Unix system and the right locale). So we have two URL creation mechanisms that already support UTF-8, two interoperable implementations. The fact that these interoperable implementations are trivial seems to bother you, but their triviality is definitely an advantage and not a disadvantage. The fact that for other configurations, the things may be a little bit more complicated as for these two cases is an issue that was discussed in quite some detail, with rather satisfactory results. It is not of concern for the procedural requirement of "two interoperable implementations". > Not only aren't they transitioning toward > using UTF-8 exclusively, If you worry that the exact wording, as proposed by Roy, is too restrictive, for example in particular that "towards using UTF-8 exclusively" may give the impression that hitherto working non- UTF-8 URLs should be discontinued prematurely, then please say so. I have absolutely no problems in changing the wording, as long as the basic intention is maintained. > I've yet to see one that actually uses > UTF-8 in resource names at all. Well, there are not that many URLs out there currently that use anything beyond ASCII. If you can name ten sites without doing research, it would rather surprise me. > When I asked for instances of some > actual practice, You didn't ask for instances of actual practice. You asked for "interoperable implementations". > I got sent two examples of URLs on Martin's own site > in which the "URL creation mechanism" was careful hand crafting > of the URLs themselves. Should I have told you that I had some magic translation software translating English filenames into Japanese? Indeed I did the translation from English to Japanese manually, by editing a file. How else should I have translated "ruby" and "FontComposition" to its Japanese translations? All the rest was just mechanical. It would be the same if I wanted to produce these Japanese resource names in EUC or SJIS, only that I would have to change one setting in a "Save As" command. > Furthermore, none of the implementors > of the "URL creation mechanisms" have stepped forward to endorse > this proposal. First, please don't ignore Francois Yergeau, from Alis! Obviously, none of them has stepped forward to say anything against it! And as far as my recollections go from the past two Unicode conferences, from the symposium in Sevilla, from a recent trip to Japan, and from private mail, many people working in the field of software internationalization, including employees of the companies you mention above, have expressed positive oppinions towards URLs and UTF-8, often clearly expressing their satisfaction that the issue of representation of characters in URLs is finally being adressed. Also, it might help you to know that for host naming, Microsoft is using UTF-8 as far as they can (i.e. as long as they are not limited by the current state of DNS when interfacing to the outside world). I have no doubt they wouldn't mind if DNS went to UTF-8 directly instead of using my proposal in draft-duerst-dns-i18n-00.txt or something similar. > The only voices for it are those who are not actually > producing "URL creation mechanisms". Even the most ferverent > believer in UTF-8 would not be so foolish as to create a product > that 'transitioned' toward 'using UTF-8 exclusively'. I can repeat it again here: If you have problems with the word "exclusively", then let's discuss that. > Certainly, 99.99% > of the installed "URL creation mechanisms that generate a URL from a > source that is restricted to a single character->octet encoding" > do *NOT* "use UTF-8 exclusively". 99% or more of the existing URLs are ASCII only. So they are UTF-8 by definition :-). As for the "creation mechanisms", how do you want to count them? > It would be irresponsible and ridiculous to insert a recommendation > into a Draft Standard of a practice that not only did not occur > in the Proposed Standard but also is not the result of implementation > experience of the community. Assume a protocol was being upgraded from proposed to draft standard. Assume a security hole was found in the protocol, and that it was rather clear how to fix this. Would it be irresponsible and ridiculous to insert a recommendation, into a Draft Standard, that this security hole should be fixed (and a recommendation of how this should be fixed)? Or would it be irresponsible and ridiculous to leave the security hole open for some perceived procedural reasons? With the danger of being accused of repetition, and of being too clear, I clearly spell out the state of the discussion, as far as I see it: - The discussion on the uri and the url list have lead to a "rough consensus" that it is a good thing to recommend UTF-8 for the encoding of characters into octets to be encoded as URLs. - The above consensus addresses a clear defficiency in the current spec with a *recommendation* that is long awaited for by those who care, while not at all affecting those that don't care. - The above consensus is in accordance with the IAB workshop recommendations, the URN syntax, the proposals of the ftpext wg, the leading individuals in IMAP internatio- nalization, and so on, and with the widely acknowledged development of the industry as a whole. - This consensus has been accepted by the document editor of the process draft (url list). - The document editor of the syntax draft (uri list), while being identical to the document editor of the process draft, has ignored the abovementionned consensus without clearly informing the list. After checking and investigation, he claimed procedural reasons for his decision, for which he never before gave the list a chance to address. After closer investigation, procedural concerns turn out to be inexistent. The requirement of "two interworking implementations" is clearly met, even if these implementations turn out to be trivial. - After "procedural concerns", the document editor currently pursues an additional line of reasoning, based on "vendor support". Vendor support is in no way procedurally relevant. Also, the vendors are ready to take up a reasonable proposal as soon as it is nailed down. - The two paragraphs of text proposed for addition are worded by a longstanding expert on URLs and Internet matters in general. Nevertheless they may still contain some inacuracies or possibilities for misinterpretation. If this turns out to be the case, the wording should be improved as quickly as possible. I'm definitely open for discussion. I hope the above is clear enough. Regards, Martin.
Received on Monday, 7 April 1997 10:47:10 UTC