- From: Shel Kaphan <sjk@netcom.com>
- Date: Sat, 27 May 1995 11:08:20 -0700
- To: www-talk@w3.org
>However, I also think it is worth considering for browser writers that >history stacks (that can be re-viewed with browser navigation >controls) are in a class of their own when it comes to caching. I agree, I recently said the same thing in another thread (Client handling of Expires:) on www-talk. There, I said that |In my opinion, for this problem to be really solved, a client should |maintain two stores: | | a) a resource content cache that handles Expires, for reducing network | traffic when a link is clicked for a second time | b) a `contents that were previously displayed' store, for use by the | history function and `back/forward' buttons. | |Of course, stores a) and b) can share memory for most information. |Typically, store b) will only be able to hold information for the |recent history. It would be wrong to call store b) a (special kind of) cache. Calling it a `history log' would be more appropriate. Good decomposition of the problem. I'll go along with this. Shel Kaphan writes: >Kee Hinckley writes: > > Automatic reloading of a page in my history stack seems rather > > user-unfriendly. Yes. The main requirement for `history browsing' is that it is fast, not that it provides up to date results. One HTTP-spec related issue here is that the current draft HTTP spec encourages writers of forms whose response messages can change through time, e.g. a search form on a dynamic database, to set the expires: field to a date in the past. From section 7.1.8 of the draft: # If a resource is dynamic by nature, # as is the case with many data-producing processes, copies of that # resource should be given an appropriate Expires value which # reflects that dynamism. Thus, if a properly programmed (expires header generating) dynamic search form is accessed with a browser that *does* automatically reload expired responses in the history, browsing a 20-link search result will be both slow and resource-intensive. And with the "other kind" of browser, the current behavior is to display error messages as the user revisits expired pages in the history stack. I think we now agree (you and I anyway) that both behaviors are wrong. The browser author has almost no choice but to make the history function ignore the expires: field. And the CGI script writer has almost no choice but NOT TO USE expires: if such "Data Missing" error messages would confuse users. One could argue that the HTTP spec is broken because of this; a history function that would ignore expires only for search scripts and the like, not for normal dynamic information, would be preferable. But currently, there is no safe way of telling the difference between `search' and `non-search'. > > I expect history loading to be fast and not go off over > > the net. I guess I could see it as a user-specified option, but... > > >I definitely see your point -- as I see it we're talking about a >"lesser of evils" situation. When you "back up" to an expired page, >there are only three things I can think of that could happen: >1. you see the expired document. >2. you see an error message and (if you interpret the message correctly) > you can reload the page manually >3. the browser reloads the page behind your back. Well, 3. usually involves animating icons and flashing http transaction progress messages, so 3. will never be completely `behind your back' if you pay attention to the screen. >Well, as Lori Anderson would put it, "?Que es mas macho?" >I guess I'd pick door number 1 -- but only for the case where you view >the page with browser navigation commands, not explicit links. I agree 1. is best, but of course only for `history browsing'. There us a subtle point here, however: as the `history log' store b) I talked about earlier cannot be infinite, the browser is sometimes forced to do 3. to satisfy a history browsing effect (2. is not really an option IMO). The point is that the user never knows beforehand if 1. or 3. will be done for an older item in the history list, and this is bad if the item was the result of a non-idempotent POST operation (i.e. a form submission that `did' something, like order a pizza). If 3. is done on such an item, this means reposting the form; and this means (unless the form author is paranoid, and luckily many are) inadvertently ordering a second pizza. Thus, not having enough RAM in your computer will be bad for your health :) There's a solution to this: if the browser needs to flush something from its cache (which contains the union of pages from the "resource cache" and "history log"), it should first try flushing pages from the resource cache. If there are none left to flush, (i.e. if the history log has become dominant), then the history log should be flushed "oldest first" (or possibly LRU), and the user should not be allowed to revisit that page, or only to be allowed to visit it with a warning and then an explicit reload. I.e. this is the same as the behavior I'm complaining about above, but with the big difference that it is under proper cache control logic, and so would apply only to the least-likely-to-be-visited pages in the history log. I would not even object to having all record of the oldest pages simply removed from the history. This is an important problem that, in my opinion, can only be solved by putting extra stuff in the HTTP-spec. A paranoid form author can provide a 70% solution to this problem within the current HTTP-spec, but nothing beyond that. For a further discussion of this problem, see my article `HTTP and statefull services' in the www-talk archive. Shel Kaphan writes: >I realize these considerations may have no role in the HTTP spec, The more I think about it, the more I am convinced that these considerations _do_ have a role in the HTTP spec: 1) parts of the solution to these problems involve HTTP extensions. 2) Also, as long as we only have the HTTP spec and the HTML spec to specify the behavior of browsers, the HTTP spec is the most likely place to solve this problem, even though the issue goes beyond data transfer. I've been meaning to submit some report/proposal to the http-wg mailing list about this, but I have not yet had the time to write one. If anyone wants to help putting together such a report, please mail me. Count me in. >however I feel there are serious problems in this area, which can only >be resolved by coordinating the behavior of browsers and servers. I agree there are serious problems. Besides browser and server authors, CGI script authors is also involved. I feel www-talk would be a good place to discuss these problems and possible solutions. I have tried to get a discussion going a number of times, but so far, little has happened :( Well, I think we have the basis for a pretty concrete proposal above, and would be happy to work with you to put it in the right form and before a forum that can act on it. [...] >Another thing that might help: perhaps there should be a way for >servers to "force" the URL (the *name*) handled by clients to something other >than the requested URL. I believe the redirection (3xx) codes in the HTTP spec could be used (abused?) for this purpose. [...] >To explain this a little more, if there were two GET requests, one for >/cgi-bin/food/hamburgers and one for /cgi-bin/food/french-fries, which >would result in a single page that ought to be cached as one page, >then the server ought to be able to say, "you asked for >/food/french-fries, but the page is called /food/generic-junk-food", >and to have the browser use that info to uniquely identify a cache >entry and update it with the newly fetched data. This might not help >to avoid fetching documents extra times, but it would help on cache >coherence if the intent was to display a dynamically generated document. I don't think this would help with cache coherence at all, for proper definitions of `coherence'. There is no reason for the `cache for real browsing commands' ever to become incoherent (contain expired page contents). It seems to me you are proposing to automatically update old versions of a page in the `history log'. If new contents for that URL are received, `history browsing' back would then display the new, changed price of hamburgers I assume. Hmmm -- I think perhaps I didn't explain this very clearly. Now, I think all this is unneeded if you have 'expires' working properly, nonetheless I should explain what I meant: Let's consider another example, this time a more realistic one. Actually this occurred to me because of another browser, uh, mis-feature. This is the way Lynx (and possibly other browsers) improperly displays hidden fields on forms. My workaround for that was to put what I would have put in hidden fields into the URL -- i.e. encode the same information that would have been in a hidden field into PATH_INFO instead. So suppose you have two ways of viewing a certain page. One way involves a form submission which changes the state somewhere in CGI-land, and would display a new "shopping basket" for the user. The purpose of the form was add something to the shopping basket and then display the contents. So the URL might contain something like /cgi-bin/shopping-basket/product-code=WHOPPER+FRIES+SHAKE. The product code is in the URL because of the hidden field bug. Later on, the user wants simply to view the shopping basket, so they click on a link to /cgi-bin/shopping-basket. In the current world, the two pages would be cached separately, so if the buggy browser ignores the expires field (and you should probably ignore this discussion if that can be made to work...) then even after you change the state of the shopping basket by submitting the form, following the link to /cgi-bin/shopping-basket might well show obsolete data, *even if it's really the same page*. The problem is that the caching is not based on the identity of the displayed page, but on the identity of the requesting URL. So I am suggesting some additional header field which, if present (and it's of course optional!) should be used as the document identifier in the client's cache, instead of the requestor's URL. While such a scheme would be great for some applications, it should not be the default, or it should at least be possible for _the service author_ to switch off. I can imagine plenty of cases where the user wants to see _the old_ version of the page (e.g. the chess board 3 moves ago, the gold price 10 minutes ago), if at all possible. Koen. It would not only be switch-off-able, but you wouldn't get this behavior at all unless you put in the extra, optional, header field. --Shel sjk@amazon.com, or sjk@netcom.com.
Received on Saturday, 27 May 1995 14:09:17 UTC