- From: Roy T. Fielding <fielding@kiwi.ICS.UCI.EDU>
- Date: Fri, 07 Mar 1997 01:37:25 -0800
- To: "Martin J. Duerst" <mduerst@ifi.unizh.ch>
- Cc: URI List <uri@bunyip.com>
>+ It is recommended that UTF-8 [RFC 2044] be used to represent characters >+ with octets in URLs, wherever possible. > >+ For schemes where no single character->octet encoding is specified, >+ a gradual transition to UTF-8 can be made by servers make resources >+ available with UTF-8 names on their own, on a per-server or a >+ per-resource basis. Schemes and mechanisms that use a well- >+ defined character->octet encoding which is however not UTF-8 should >+ define the mapping between this encoding and UTF-8, because generic >+ URL software is unlikely to be aware of and to be able to handle >+ such specific conventions. Here is where you lose me. I have no desire to add a UTF-8 character mapping table to our server. An HTTP server doesn't need one -- its URLs are either composed by computation (in which case knowing the charset is not possible) or by derivation from the filesystem (in which case it will use whatever charset the filesystem uses, and in any case has no way of determining whether or not that charset is UTF-8). The server doesn't care and should not care. It is therefore inappropriate to suggest that it should add such a table when doing so would only bloat the server and slow-down the URL<->resource mapping process. >> Data corresponding to excluded characters must be escaped in order >> to be properly represented within a URL. However, there do exist >> some systems that allow characters from the "unwise" and "national" >> sets to be used in URL references (section 3); a robust >> implementation should be prepared to handle those characters when >> it is possible to do so. > >Change to: > >There exist some systems that allow characters/octets from the >"unwise" and "others" sets to be used in URL references (section 3). >Until a uniform representation for characters within URLs is firmly >established, such practice is not stable with respect to transcoding >and therefore should be avoided. >However, robust implementations should be prepared to handle those >octet values when it is possible to do so. No thanks -- the existing paragraph is far better. Transcoding is not an issue unless they are already violating the specification, in which case they are prepared to suffer the consequences. The purpose of the paragraph is to prevent an implementer from interpreting the spec too literally and crashing on a non-urlc character. .....Roy
Received on Friday, 7 March 1997 04:41:59 UTC