- From: Jeffrey Mogul <mogul@pa.dec.com>
- Date: Mon, 11 Mar 96 15:08:41 PST
- To: David Robinson <drtr1@cus.cam.ac.uk>
- Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
A proxy that did not return the cached document http://foo/%7Euser in response to a request for http://foo/~user is not behaving efficiently. So proxies need to canonicalise the URL for the cache key. Hence it would be confusing if the proxy did not use this canonicalised URL in requests it issued. I think it would be more "confusing" if by canonicalizing a URL the proxy turned it into something that identified a different resource. I.e., we should be quite cautious about pursuing "efficiency" here. Efficiency is great as long as it gets the right answers. This is why the proxy in Apache 1.1 beta does a lot of URL rewriting; not only %xx <-> char as appropiate, but also This might be appropriate in some cases, but it's clearly not appropriate for every instance of %xx. RFC1738 specifically states, for example, The character "#" is unsafe and should always be encoded because it is used in World Wide Web and in other systems to delimit a URL from a fragment/anchor identifier that might follow it. The character "%" is unsafe because it is used for encodings of other characters. so a proxy that converted http://foo%23bar to http://foo#bar would be non-compliant with RFC1738. http://foo -> http://foo/ I'd like to see a specific citation to a standard or even an I-D that makes this a compliant transformation. In draft-ietf-http-v11-spec-01.txt, for example, I find this BNF: URI = ( absoluteURI | relativeURI ) [ "#" fragment ] absoluteURI = scheme ":" *( uchar | reserved ) which suggests that the URI need not end in "/". and (perhaps dubiously) http://foo/bar? -> http://foo/bar Very dubiously, especially since we also have a well-understood heuristic that caches do not store responses to GETs with "?" in the URL. If you have a series of caching proxies in the path, and the first one does this transformation (dropping the "?"), the second one will not realize that the GET is a potentially non-cachable query. -Jeff
Received on Monday, 11 March 1996 15:16:36 UTC