- From: Shel Kaphan <sjk@amazon.com>
- Date: Tue, 3 Dec 1996 22:34:43 -0800 (PST)
- To: Jeffrey Mogul <mogul@pa.dec.com>
- Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Jeffrey Mogul writes: > There's another category of cache-busting that you did not mention in > the statistics you reported. This is the use of unique URL > components, which may be "once-only" URLs, or are at least unique for > a single user. > > Right you are. I should have been more explicit in the title of > my message, and I didn't explain it clearly enough in the body > of the message, but this analysis was only aimed at finding instances > of cache-busting that might easily be avoided through use of our > hit-metering proposal. I thought it would be more realistic to > look for cache-busting that is done without using the unique-URL > technique. > Yes, sure. You'd have to resort to unreliable heuristic techniques to pick out such URLs. In fact, you're likely to have already considered them in one of your other categories, since they are more likely to show up as invocations of CGI programs and the like, rather than static ".html" URLs -- *something* on the server end has to interpret or strip off the unique part of the URL. Unless the http server itself has been hacked, it will be a CGI program or the moral equivalent. > It's not clear to me whether the users of once-only URLs would > switch to a more cache-friendly approach if our hit-metering > proposal were available. (Clearly, anyone that requires > cache-busting to provide usable results in the face of broken > history mechanisms is not going to switch, at least not until > virtually all browsers have fixed their history support.) So > I therefore assumed that non of the once-only URLs would be > amenable to hit-metering, and so I did not try to include these > URLs in my category of "possibly cache-busted responses." > They're mainly not amenable to hit metering because it's impossible to algorithmically determine the "equivalence class" of once-only URLs -- all the superficially distinct URLs that fetch "the same" resource look like different URLs. Anyway I'd have to guess that the overwhelming majority of servers that work using unique URLs do it more for semantics than explicitly for cache-busting. One question that must be asked about this: is this technique prevalent enough to be worth worrying much about? I see it a lot, but then, I pay attention to sites that do stuff like this. > On the other hand, it's not clear that I could have identified them > from their names. If they were pre-expired or had no last-modified > date, and they did not match my CGI filter, I would have included > them in my category of "possibly cache-busted responses" by mistake. > but that "mistake" is actually OK, right? > When I am ready to re-do the analysis, I'll try a version that is > limited to URLs for which the trace contains at least two status-200 > responses. Presumably this will avoid any once-only URLs, right? It will avoid true "once-only" URLs, but you still might see some matches on "per-session" URLs -- ones that track a user through a session. These per-session URLs are also fairly pointless to cache in a shared cache, since they're only relevant to one user, but that user might ask for the same thing more than once. Based purely on anecdotal evidence I think per-session URLs are a lot more common than true once-only URLs. > However, it will decrease the sample size by a large factor, which > means that the significance of the results may be weakened. > > -Jeff > > --Shel
Received on Tuesday, 3 December 1996 22:40:44 UTC