- From: Jeffrey Mogul <mogul@pa.dec.com>
- Date: Mon, 11 Mar 96 15:08:41 PST
- To: David Robinson <drtr1@cus.cam.ac.uk>
- Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
A proxy that did not return the cached document http://foo/%7Euser
in response to a request for http://foo/~user is not behaving
efficiently. So proxies need to canonicalise the URL for the cache
key. Hence it would be confusing if the proxy did not use this
canonicalised URL in requests it issued.
I think it would be more "confusing" if by canonicalizing a URL
the proxy turned it into something that identified a different
resource. I.e., we should be quite cautious about pursuing "efficiency"
here. Efficiency is great as long as it gets the right answers.
This is why the proxy in Apache 1.1 beta does a lot of URL rewriting;
not only %xx <-> char as appropiate, but also
This might be appropriate in some cases, but it's clearly not
appropriate for every instance of %xx. RFC1738 specifically
states, for example,
The character "#" is unsafe and should
always be encoded because it is used in World Wide Web and in other
systems to delimit a URL from a fragment/anchor identifier that might
follow it. The character "%" is unsafe because it is used for
encodings of other characters.
so a proxy that converted
http://foo%23bar
to
http://foo#bar
would be non-compliant with RFC1738.
http://foo -> http://foo/
I'd like to see a specific citation to a standard or even an
I-D that makes this a compliant transformation. In
draft-ietf-http-v11-spec-01.txt, for example, I find this
BNF:
URI = ( absoluteURI | relativeURI ) [ "#" fragment ]
absoluteURI = scheme ":" *( uchar | reserved )
which suggests that the URI need not end in "/".
and (perhaps dubiously) http://foo/bar? -> http://foo/bar
Very dubiously, especially since we also have a well-understood
heuristic that caches do not store responses to GETs with "?"
in the URL. If you have a series of caching proxies in the
path, and the first one does this transformation (dropping the "?"),
the second one will not realize that the GET is a potentially
non-cachable query.
-Jeff
Received on Monday, 11 March 1996 15:16:36 UTC