- From: Drazen Kacar <Drazen.Kacar@public.srce.hr>
- Date: Tue, 3 Dec 1996 09:40:36 +0100 (MET)
- To: Jeffrey Mogul <mogul@pa.dec.com>
- Cc: snowhare@netimages.com, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Jeffrey Mogul wrote: > > If someone would like to propose a *feasible* filter on URLs > and/or response headers (i.e., something that I could implement > in a few dozen lines of C) that would exclude other CGI > output (i.e., besides URLs containing "?" or "cgi-bin", which > I already exclude), then I am happy to re-run my analysis. You can check for everything that ends with ".cgi" and ".nph" as well as everything that starts with "nph-". Don't forget that CGIs can have trailing path info. > Drazen Kacar pointed out that I should probably have > excluded .shtml URLs from this category, as well, because > they are essentially the same thing as CGI output. I checked > and found that 354 of the references in the trace were to .shtml > URLs, and hence 10075, instead of 10429, of the references > should have been categorized as possibly cache-busted. (This > is a net change of less than 4%.) There is a short (3 char) extension as well. I don't know which one. I think it's ".shm", but I'm not sure. You'll get additional percent or two if you inlude all of these. > I would say the only *confirmable* deliberate cache busting done > are the 28 pre-expired responses. And they are an insignificant > (almost unmeasurable) percentage of the responses. Some of them are probably due to HTTP 1.0 protocol and could have been cacheable if the server could count on vary header being recognized by the client. > In short, if we are looking for a rigorous, scientific *proof* > that cache-busting is either prevalent or negligible, I don't > think we are going to find it in traces, and I can't think of > where else one might look. I can. On-line advertising mailing lists. I'm subscribed to one of those not because it's my job, but to stay in touch with the web things. I'm just a lurker there (OK, I'm a lurker here as well, but not because I want to. I can't find time to read the drafts and I'm at least two versions behind with those I did read.) People on the list are professionals and experts in their field, but not in HTML or HTTP. A month ago somebody posted "a neat trick" which had these constructs in HTML source: <FONT FACE="New Times Roman" "Times Roman" "Times" SIZE=-1>...</FONT> <A HREF=...><TABLE>...</TABLE></A> Than somebody else pointed out that Netscape won't make the whole table clickable if it's contained in anchor. The answer from the original author started with "For some reason (and I don't know why) it seems that Netscape can't...". I let that one pass to see if anyone would mention DTDs, syntax, validators or anything at all. No one did. This is viewed as lack of functionality in NSN, and not as trully horrible HTML. To be fair, I must mention that most of them know a thing or two about ALT attributes and are actively fighting for its usage. They probably don't know it's required in AREA, but IMG is a start. My ethernal gratitude to people who are fighting on comp.infosystems.www.html. I stopped years ago. Another example is HTTP related. There was talk about search engines and one person posted that cheating them is called "hard working". Than there was a rush of posts saying that is not ethical and that pages text that contains repeating of key words could come up on the top of the list, but it would look horrible when the customer really requests the page. No one mentioned that you can deliver one thing to the search engine and another to the browser. To conclude, marketing people are clueless about HTML and (even more) HTTP and they can't participate on this list. It's not that they would not want to. They have some needs and if those are not met with HTTP, responses will be made uncacheable as soon as they find out how to do it. I'm doing the same thing because of charset problems. It's much more important for the information provider that users get the right code page than to let proxy cache the wrong one. OK, I'm checking for HTTP 1.1 things which indicate that I can let the entity body be cacheable, but those are not coming right now and (reading the wording in HTTP 1.1 spec) I doubt they will. A few examples of what's needed... Suppose I need high quality graphics for the page, but it's not mandatory. I'll make two versions of pictures, one will have small files and the other will (can't do anything about it) have big files. I can conclude vie feature negotiation if the user's hardware and software can display high quality pictures, but not if the user wants it, ie. if the bandwidth is big enough or if the user is prepared to wait. So, I'll display low res pictures by default and put a link to the same page with high res graphics. User's preference will be sent back to him in the cookie. It's really, really hard and painfull to maintain two versions of pages just for this and I'd want my server to select appropriate picture based on URL and the particular cookie. What happens with the proxy? I can send "Vary: set-cookie", but this is not enough. There'll be other cookies. On a really comercial site there'll be one cookie for each user. People are trying to gather information about their visitors. I can't blame them, although I have some ideas about preventing this. (Will have to read state management draft, it seems). Anyway, this must be made non cacheable. Counting on LOWSRC is not good enough. Another thing are ad banners. Some people are trying not to display the same banner more than 5 or 6 time to a particular user. The information about visits is stored in (surprise, surprise) cookie. The same thing, again. I think that technical experts should ask the masses what's needed. Don't expect the response in the form of Internet draft, though. -- Life is a sexually transmitted disease. dave@fly.cc.fer.hr dave@zemris.fer.hr
Received on Tuesday, 3 December 1996 00:55:20 UTC