Date: Fri, 25 Apr 1997 15:27:02 +0200 (MET DST) From: "Martin J. Duerst" <firstname.lastname@example.org> To: Larry Masinter <email@example.com> Cc: John C Klensin <firstname.lastname@example.org>, email@example.com Subject: Re: UTF-8 and URLs In-Reply-To: <335F90D8.6EDB@parc.xerox.com> Message-Id: <Pine.SUN.3.96.970425100102.245p-100000@enoshima> Hello Larry, Many thanks for your recent message, which is extremely encouraging and is focussing the discussion in the right direction. I will mainly address the general issues in this mail, and send a separate mail with a few technical comments. Please read to the end of the mail, because I have to make an important precondition. On Thu, 24 Apr 1997, you wrote: > I think to actually solve the problem of Internationalization > of URLs we need two recommendations: [comprehensive summary deleted] > These three recommendations affect software from a large number > of different producers. To make progress in the community, > those software implementors will need to agree that this is > the best solution to interoperability of URLs internationally. > > I think given its likely controversial nature, we should clearly > make these recommendations in a separate RFC, and perhaps with > a new working group. I definitely think that a working group is a good thing to have, so that we have an agenda, a chair who can cool down things when we get heated up, and so on. I also think that a separate RFC, or indeed several separate RFCs, are needed, mainly because some of them may come to contain much text/data, and they will address various issues that touch various areas outside URLs proper. Currently, I see the following possible RFCs: - Rationale for Internationalized Identifiers Explaining where and why internationalized identifiers are useful/necessary/possible, answering some of the most frequently raised objections, and stating where and when internationalized identifiers should not be used. This should not be seen as a rationale document that needs to proceed everything else, but rather as an explanatory (informal) document that we can refer to when people ask basic questions. - Internationalized URL Architecture Central document explaining the basic workings, what to interpret/convert to what in what case, including some details about upgrading strategy - Normalization for Internationalized Identifiers Listing character ranges and characters that should be mapped to others to avoid problems, such as compatibility characters, combining sequences,...; this would be written to be useful for other things than just URLs (things that might end up in URLs eventually anyway); we would expect input from the character sets standards groups here - Bidirectionality for Internationalized Identifiers Giving exact specifications for handling BIDI in internationalized identifiers. This might be merged with the previous one, as it should be quite short. - Handling Internationalized Forms and Query Parts This would define the conventions and additions to HTML/HTTP along the lines we have been discussing. > I'm willing to put this all down in a separate internet draft, > if it will help focus the process on actually making progress. > Some of the examples that have been sent out to the mailing list > will be useful to guide the recommendations in the RFC. I have repeatedly volunteered to author such drafts, and I would be very happy to work together with Larry and others on these. Given the amount of mail I have written on these issues in the past few weeks/months and the discussions and presentations I have enjoyed with many other people concerned about these topics, and given my current workload, I think it should be possible to produce first drafts of the above by mid May or end of May. Because the general theme of the above is internationalization of identifiers, I propose to name the group iii, standing for Internationalization of Internet Identifiers. I know that the IETF likes very focussed groups with clear goals, and I think we can very well focus on one goal and a few documents at the time. On the other hand, it seems sensible to have a group with a general name so that we can retarget it if necessary after having completed our first goals. After all these proposals, I have to add an important RESERVATION. I think it makes sense to have separate drafts, because of the things discussed above and because the current draft is rather advanced, and should proceed quickly now. However, having the two things *completely* unrelated, and leaving readers of the new RFC-to-be ignorant about what is going on, is in my oppinion a very dangerous and bad idea. Therefore, I think we need SOME language in the current draft that makes its readers aware of what is going on, so that incompatibilities and bad surprises can be avoided, or at least so that we don't be blamed for them if people don't read the draft. For formal reasons, these comments should probably take the form of a note. They should say that: - This document does not define how to handle or to map characters outside the US-ASCII repertoire. [this point doesn't need to be a note] - An extension of this standard to make it possible for URLs to contain characters beyond US-ASCII where this is feasible are under discussion. - This extension will be based on using UTF-8 for character<->octet mapping. - The use of characters outside US-ASCII to write down URLs might currently seem to work in some cases and configurations, but is not guaranteed widely enough and is strongly discouraged until the extension is available. - URLs where non-US-ASCII octets are correctly escaped with %HH will not be affected by the extension and will continue to work correctly. - When making available new URLs which represent characters outside US-ASCII, where feasible these should be made available by using UTF-8 as a character->octet mapping. The above points cover, I hope, what we can say currently, in a way that does not restrict us too much or have us promize too much, while avoiding bad surprises for the readers and for us. Also, it should be feasible as a note even in a draft standard. Of course, I am open to any kind of suggestions for better wording. I hope that we can proceed along these lines, which I think form a compromize acceptable to the widest part of our group. With kind regards, Martin.