- From: Roy T. Fielding <fielding@kiwi.ICS.UCI.EDU>
- Date: Tue, 15 Apr 1997 17:10:48 -0700
- To: Gary Adams - Sun Microsystems Labs BOS <Gary.Adams@east.sun.com>
- Cc: uri@bunyip.com
>> d) what does the browser do with what the user typed >> in order to turn it into the URL that was generated in (a). > >Today the only alternative is to say the platform specific encoding >of the server system must be %HH encoded as raw octets and published >in the magazine, which the user enters as raw ascii strings, which is >transmitted to the server where it is %HH decoded and handed to the local >data store. i.e., it is only meaningful to the local server and is >opaque to the magazine, the end user and the browser. Right -- that is also the case if the author decides that the most interoperable URL is represented in ASCII, even though the underlying characters are non-ASCII. [Note that it is always possible to include both URLs in the magaizine.] >If the encoding is labeled (or known to be UTF8), then the magazine >could publish either native character representation or a %HH escaped >URL. Similarly the browser could support input of native characters >or a %HH escaped URL. Finally, the %HH escaped UTF8 URL is transmitted >to the server and converted for use in accessing the local resource. The magazine could also just publish the native character representation and assume that the reader's browser is set up to use the same charset encoding as the server. OTOH, the standard could say that when a URL is entered from a source that has no charset, use UTF-8. The question is really about what is the most likely charset used by the server. This is the crux of the problem. If a browser assumes that the server is using UTF-8 and transcodes the non-ASCII octets before submission to the server, then bad things happen if the server is not using UTF-8. The nature of the "bad things" range from disallowed access to invalid form data entry. Since it is not possible for us to require all servers to be upgraded, it is not safe for browsers to perform transcoding of URLs, and therefore it is impossible to deploy a solution that requires UTF-8 transcoding UNLESS that decision is based on the URL scheme. Likewise, a server often acts as a gateway for some parts of its namespace, as is the case for CGI scripts and API modules like mod_php, and other parts of its namespace are derived from filesystem names. On a server like Apache, the filesystem-based URLs are generated by url-encoding all non-urlc bytes without concern for the filesystem charset. While it is theoretically possible for the server to edit all served content such that URLs are identified and transcoded to UTF-8, that would assume that the server knows what charset is used to generate those URLs in the first place. It can't use a single configuration table for all transcoding, since the URLs may be generated from sources with varying charsets. The bottom line is that a server cannot enforce UTF-8 encoding unless it knows that all of its URLs and gateways use a common charset, and if that were the case we wouldn't need a UTF-8 solution. I listed out the solution space in the hope that people would see the trade-offs. We know that all-ASCII URLs *interoperate* well on the Internet, but we also know that they can be ugly. We know that existing systems will accept non-ASCII URLs if the charset matches that used by the URL generator/interpreter on the server. We also know that most existing, deployed servers are not restricted to generating UTF-8 encoded URLs. In a perfect world, requiring UTF-8 would be a valid solution. But this is not a perfect world! The purpose of an Internet standard is to define the requirements for interoperability between implementations of the applicable protocol. A solution that requires UTF-8 will fail to interoperate with systems that do not require UTF-8, and the latter is the case for most URL-based systems on the Internet today. ...Roy T. Fielding Department of Information & Computer Science (fielding@ics.uci.edu) University of California, Irvine, CA 92697-3425 fax:+1(714)824-1715 http://www.ics.uci.edu/~fielding/
Received on Tuesday, 15 April 1997 20:12:47 UTC