- From: Daniel W. Connolly <connolly@hal.com>
- Date: Mon, 12 Dec 1994 11:59:14 -0600
- To: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Consider this scenario: On a server S, a document D is available as plain text, HTML, or postscript. Client C1 is configured to only accept HTML. This client requests document http://S/D via proxy P as: GET http://S/D HTTP/1.0 Accept: text/html Proxy P connects to S, requests the document, caches it, and returns it to C1. Client C2 is configured to accept postscript and HTML, and to prefer postscript over HTML. It requests the same document through the same proxy: GET http://S/D HTTP/1.0 Accept: text/html; q=0.5 Accept: application/postscript; q=1.0 Proxy P receives the request, and notices that it has http://S/D in its cache, so it returns the cached copy. Note that had C2 requested the document straight from S, it would have got postscript. But it got HTML from the proxy. To me, this looks like the caching performed by P is not transparent, and hence violates the protocol. OK, ok, so currently nobody uses format negociation, and certainly nobody implements the q and c parameters on accept headers (except probably the CERN linemode browser and server). But some information providers are using, of all things, the User-Agent field to customize their documents: they server up different stuff for MacMosaic, WinMosaic, Netscape, etc. Certainly broken proxy caching is observable in these circumstances. (but in this case, I'd say the fault is at the informatino provider for abusing User-Agent this way, not at the caching proxy.) One way to correct the behaviour of proxy P above is to base the cache on not tjust the URL in question, but also include all the request headers in the cache key. But clearly this is way too conservative. It seems to me that the HTTP protocol spec should specify which request headers can affect the returned data, and which are just "advisory." A correct cache would key on the URL plus all the request headers which are allowed to affect the returned data. For example, authentication headers shouldn't affect the returned data. User-Agent shouldn't affect the retuned data. (The fact that it does is a wart that we'll have to deal with somehow.) It means that introducing new headers that can affect the returned data (like the recently proposed Accept-Charset: header) can't be done with correct backwards compatibility. It might be wise to say that all headers matching Accept-*: are allowed to affect the returned data. Also... I haven't carefully reviewed the latest HTTP/1.0 spec: does it include some specification of what is going on when a client requests ftp://host/path or gopher://host/path via an HTTP proxy? Does it discuss correct vs. heuristic caching in these cases? Food for thought... Dan
Received on Monday, 12 December 1994 10:05:07 UTC