- From: Larry Masinter <masinter@parc.xerox.com>
- Date: Thu, 24 Apr 1997 09:56:56 PDT
- To: John C Klensin <klensin@mci.net>
- Cc: uri@bunyip.com
John, Your clarification didn't help me. And the sticking point for me is that "as a sequence of glyphs" is an important part of the transport of URLs, whether those glyphs are on paper or on the screen, and that the octet->glyph and glyph->octet route is really error-prone. I think to actually solve the problem of Internationalization of URLs we need two recommendations: a) If you're writing software that displays URLs to users, then 1) any 'forbidden' octets should be displayed as if they were UTF-8 encoded characters. That is, those octets are currently disallowed in URLs, but if you see them, display them in a standard way. 2) Any sequences of %HH-encoded octets should be displayed EITHER as <%><H><H>, e.g., just show the encoding in ASCII, OR by assuming that they're hex-encoded UTF-8. The latter assumption is likely to be wrong for now, but might change later. b) If you're writing software that lets users type in URLs, then if the user types in any character that isn't legal in a URL, encode the character as hex-encoded UTF-8. For Japanese, avoid using double-wide characters. For RTL scripts such as Hebrew or Arabic, leave out any direction changes and encode the characters in logical, not presentation order. Since there haven't been any standards for non-ASCII character representations, this is as good a choice as any. c) If you're writing software that generates URLs to be interpreted later, then use hex-encoded UTF-8 for the encoding to generate, and accept either the raw UTF-8 or the hex-encoded version as identifying the same resource. This is a recommendation for HTTP servers and FTP servers and a variety of other implementations. These three recommendations affect software from a large number of different producers. To make progress in the community, those software implementors will need to agree that this is the best solution to interoperability of URLs internationally. I think given its likely controversial nature, we should clearly make these recommendations in a separate RFC, and perhaps with a new working group. I'm willing to put this all down in a separate internet draft, if it will help focus the process on actually making progress. Some of the examples that have been sent out to the mailing list will be useful to guide the recommendations in the RFC. Regards, Larry -- http://www.parc.xerox.com/masinter -- http://www.parc.xerox.com/masinter
Received on Thursday, 24 April 1997 12:58:49 UTC