- From: Andrew Daviel <andrew@andrew.triumf.ca>
- Date: Tue, 10 Jun 1997 00:56:14 -0700 (PDT)
- To: http-wg@cuckoo.hpl.hp.com
- Cc: ircache@nlanr.net
As much of the cache-busting document is based on some of my Web pages, I'll try to address concerns so far in an "omnibus" reply. While this material has been around for some time, it hasn't attracted much comment from knowledgeable folk - who presumeably go direct to the spec.s (or wrote them) ... Larry Masinter wrote: > I'd like to see more analysis (or references to it) associated > with each individual piece of advice. Probably a good idea; much of it is guesswork, though one can glean a certain amount from proxy cache statistics. > I've not seen any studies ... my guess that, after such documentation, > you'd have more specific advice than 'use HTTP/1.1'. It is clear from the RFC that the designers have given some thought to hierarchical cache requirements, introducing Cache-Control elements such as "private", "Max-Age", etc. I have seen reports at W3 indicating a significant performance boost from using HTTP/1.1 over 1.0. > When is it feasible (to use Expires headers) Do sites with planned > expiration set expires dates? > Is it feasible to, for example, declare that '/images' at a site > never changes I do. I happen to use a script to modify a .meta side file (which worked under Apache 1.1), and also to generate Expires in CGI, but Apache 1.2 has support for generating Expires from the .htaccess file, per-directory or per-file, allocating a maximum age either since the page was modified or since it was accessed. It is quite feasible to award images an expiry date a year in the future while text lasts a week, a day or an hour. There are many questions asked about "how do I prevent pages being cached". Much of the time, what the author really wants is not to make it uncacheable but to ensure that a user gets "todays page", not yesterdays. > > Use an HTTP server which supports the GET ... with If-Modified-Since > Don't they all? At least for files? I would hope so. However, if someone writes a custom database interface they might forget to handle IMS, even though the database entries may have file-like properties (a meaningful last-modified date). There are servers that don't understand HEAD, that return HEAD as if it were GET, and I once found one that served illegal Date fields. Don't count on anything. Ben Laurie wrote: > Changing all the references > would be onerous, though - unless it was done by server-side parsing > (yech). I've occasionally done this with something like "find /usr/htdocs -name *.html -exec fix-it.pl {} \; " using in-place editing in Perl. Martin Hamilton wrote: > >Don't use redirects, since their results are uncacheable. > 301 is cacheable, 302 is not. Use what you need. When I wrote this, no-one used 301. A redirect isn't a big hit, anyway, but if the net's truly bad it might make a difference where a hierarchy cache might otherwise serve a cached page without checking it. >> Don't use content-negotiation until HTTP 1.1 is more widely >> deployed, since in HTTP/1.0 it interacts badly with proxy caches. >What am I supposed to use until then? Not many people do, anyway. Doing it via a redirect is a compromise; it works, the pages themselves are cacheable, but it requires two requests not one. Not a big deal, perhaps, as http://some.org/xxx requires 2 to http://some.org/xxx/index.html's one. I've used the Apache action module to assign a certain file suffix to negotiated pages - xxx.lang launches CGI to redirect to xxx.en.html, xxx.fr.html, etc. depending on Accept-Language. > > Don't use server modules .. convert document's character > > set on the server side. > What if the client can't do it? OK OK, do what you have to. Given the choice, though, it's preferable not to manipulate the content based on user-agent unless some thought is given to cacheing - perhaps redirecting MSIE users down one leg, Netscape Winxx down another, Netscape X11 down a third. David W. Morris wrote: > Sorry, it is the client's resonsiblity to declare what is is capabile > of. ... Roll on feature negotiation ! Meanwhile ... someone negotiates a page based on MSIE with 24-bit colour. Next guy to hit the cache has Netscape on X11 with 8-bit .... How to fix this - redirects, I guess. The easy way out is just to set Cache-Control: private or Pragma: no-cache and bypass hierarchical cache. Wojtek Sylwestrzak wrote: > Unfortunately most of the servers practicing this today > try to perform a 'naive' content negotiation, which effectively > uses redirects to other urls. This is of course wrong, > because it unnecessarily expands the url addressing space, > thus making caching less effective. I don't think so ... If I have A.var, which redirects to A.en.html, A.jp-jis.html, A.jp-eu.html, A.fr.html I have one small uncacheable redirect, and 4 cacheable documents. The 4 documents are all different, and have distinct URLs, so are cached independantly. There is the question of what a spider sees ... an agent without Accept-Language may get an HTML list of the separate pages, so they get indexed separately, which is fine, except that the search engine result points to the final page, not the original negotiation script. (my proposal draft-daviel-metadata-link-00.txt addresses that ) > From the caching point of view it would be a very good practice > for the clients to request/expect a single, standard charset > for a given language (considered being a 'transport' charset). Nice idea; pity everyone's platform uses different coding :-( (shift-jis, jis, euc-jp; koi-8, 8859-5, Windows-xxx etc etc.) I think in some cases DOS, Windows, X11 and Mac are all different. Unicode may help, but I hear it's not perfect either (missing some charsets, 2 bytes required instead of one in many cases ..) Shel Kaphan wrote: > > Don't use secure servers to serve images and other non-sensitive > > objects, since these will be uncacheable and may not be passed > > through a cache hierarchy. > > > Not a good recommendation: some browsers will put up a dialog box > whenever there's a reference from a secure page to a non-secure page, I don't have a tame https server to play with and hadn't realized. I've modified the original document. In common I suspect with many of you when I access my banking services on the Web I want to get on and do the job at least as fast as on a touch-tone phone, not wait for a lot of background images, adverts, icons etc. to download over my phoneline. It seemed daft to serve these images from the uncacheable https channel. As it is, I turned on cache for https in Netscape. If someone gets root access to read my cache files, they can snarf my passwords and credit card numbers right out of /dev/kmem. ... of course, they could just check the trash ... Andrew Daviel TRIUMF & Vancouver Webpages
Received on Tuesday, 10 June 1997 00:55:56 UTC