Re: proxies rewriting URLs from David Robinson on 1996-03-11 (ietf-http-wg@w3.org from January to March 1996)

From: David Robinson <drtr1@cus.cam.ac.uk>
Date: Mon, 11 Mar 96 18:01 GMT
To: fielding@avron.ICS.UCI.EDU, hallam@w3.org
Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <m0twBuF-000DJRC@ursa.cus.cam.ac.uk>

>I think that it is a question of what people can encode into URIs. 
>URIs do not specify a canonical form so that there can be several
>URIs which logically mean the same thing.
>
>If we are clear on the "meaning" part then two syntactic variants
>of the same URI should be interchangeable. If something breaks
>because of this problem then it is something which relied upon a
>syntactic variation and was therefore broken.
>...
>I can think of many reasons why
>a proxy might choose to canonicalize URIs internally, cache matches
>for one. If one considers the action of a caching proxy I think that
>canonicalization of URIs in passed on requests is likely to be highly
>desirable.
>
>Since Larry reports that there are already proxies doing this sort of 
>transformation I think it best to leave things as they are but include
>a warning to state that problems might occur.

Absolutely!

A proxy that did not return the cached document http://foo/%7Euser
in response to a requset for http://foo/~user is not behaving efficiently.
So proxies need to canonicalise the URL for the cache key.
Hence it would be confusing if the proxy did not use this canonicalised URL
in requests it issued.

This is why the proxy in Apache 1.1 beta does a lot of URL rewriting;
not only %xx <-> char as appropiate, but also
http://foo  -> http://foo/
and (perhaps dubiously) http://foo/bar? -> http://foo/bar

 David Robinson.

Received on Monday, 11 March 1996 10:05:00 UTC