W3C home > Mailing lists > Public > ietf-http-wg-old@w3.org > January to April 1996

Re: proxies rewriting URLs

From: Jeffrey Mogul <mogul@pa.dec.com>
Date: Mon, 11 Mar 96 15:08:41 PST
Message-Id: <9603112308.AA07059@acetes.pa.dec.com>
To: David Robinson <drtr1@cus.cam.ac.uk>
Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
    A proxy that did not return the cached document http://foo/%7Euser
    in response to a request for http://foo/~user is not behaving
    efficiently.  So proxies need to canonicalise the URL for the cache
    key.  Hence it would be confusing if the proxy did not use this
    canonicalised URL in requests it issued.
    
I think it would be more "confusing" if by canonicalizing a URL
the proxy turned it into something that identified a different
resource.  I.e., we should be quite cautious about pursuing "efficiency"
here.  Efficiency is great as long as it gets the right answers.

    This is why the proxy in Apache 1.1 beta does a lot of URL rewriting;
    not only %xx <-> char as appropiate, but also

This might be appropriate in some cases, but it's clearly not
appropriate for every instance of %xx.  RFC1738 specifically
states, for example,
	The character "#" is unsafe and should
   always be encoded because it is used in World Wide Web and in other
   systems to delimit a URL from a fragment/anchor identifier that might
   follow it.  The character "%" is unsafe because it is used for
   encodings of other characters.
so a proxy that converted
	http://foo%23bar
to
	http://foo#bar
would be non-compliant with RFC1738.

    http://foo  -> http://foo/

I'd like to see a specific citation to a standard or even an
I-D that makes this a compliant transformation.  In
draft-ietf-http-v11-spec-01.txt, for example, I find this
BNF:
       URI            = ( absoluteURI | relativeURI ) [ "#" fragment ]
       absoluteURI    = scheme ":" *( uchar | reserved )
which suggests that the URI need not end in "/".

    and (perhaps dubiously) http://foo/bar? -> http://foo/bar
    
Very dubiously, especially since we also have a well-understood
heuristic that caches do not store responses to GETs with "?"
in the URL.  If you have a series of caching proxies in the
path, and the first one does this transformation (dropping the "?"),
the second one will not realize that the GET is a potentially
non-cachable query.

-Jeff
Received on Monday, 11 March 1996 15:16:36 EST

This archive was generated by hypermail pre-2.1.9 : Wednesday, 24 September 2003 06:31:48 EDT