Re: caching dilemma from Koen Holtman on 1995-05-27 (www-talk@w3.org from May to June 1995)

From: Koen Holtman <koen@win.tue.nl>
Date: Sat, 27 May 1995 13:05:23 +0200 (MET DST)
To: sjk@amazon.com
Cc: www-talk@www10.w3.org
Message-Id: <199505271105.NAA12831@wswiop05.win.tue.nl>
Shel Kaphan writes:
>However, I also think it is worth considering for browser writers that
>history stacks (that can be re-viewed with browser navigation
>controls) are in a class of their own when it comes to caching.

I agree, I recently said the same thing in another thread (Client
handling of Expires:) on www-talk.  There, I said that

|In my opinion, for this problem to be really solved, a client should
|maintain two stores:
|
| a) a resource content cache that handles Expires, for reducing network
|    traffic when a link is clicked for a second time
| b) a `contents that were previously displayed' store, for use by the
|    history function and `back/forward' buttons.
|
|Of course, stores a) and b) can share memory for most information.
|Typically, store b) will only be able to hold information for the
|recent history.

It would be wrong to call store b) a (special kind of) cache.  Calling
it a `history log' would be more appropriate.

Shel Kaphan writes:
>Kee Hinckley writes:
> > Automatic reloading of a page in my history stack seems rather
> > user-unfriendly.

Yes.  The main requirement for `history browsing' is that it is fast,
not that it provides up to date results.

One HTTP-spec related issue here is that the current draft HTTP spec
encourages writers of forms whose response messages can change through
time, e.g. a search form on a dynamic database, to set the expires:
field to a date in the past.  From section 7.1.8 of the draft:

#   If a resource is dynamic by nature, 
#   as is the case with many data-producing processes, copies of that 
#   resource should be given an appropriate Expires value which 
#   reflects that dynamism.

Thus, if a properly programmed (expires header generating) dynamic
search form is accessed with a browser that *does* automatically
reload expired responses in the history, browsing a 20-link search
result will be both slow and resource-intensive.

The browser author has almost no choice but to make the history
function ignore the expires: field.

One could argue that the HTTP spec is broken because of this; a
history function that would ignore expires only for search scripts and
the like, not for normal dynamic information, would be preferable.
But currently, there is no safe way of telling the difference between
`search' and `non-search'.

> > I expect history loading to be fast and not go off over
> > the net. I guess I could see it as a user-specified option, but...
> > 
>I definitely see your point -- as I see it we're talking about a
>"lesser of evils" situation.  When you "back up" to an expired page,
>there are only three things I can think of that could happen:
>1. you see the expired document.
>2. you see an error message and (if you interpret the message correctly)
>        you can reload the page manually
>3. the browser reloads the page behind your back.

Well, 3. usually involves animating icons and flashing http transaction
progress messages, so 3. will never be completely `behind your back'
if you pay attention to the screen.

>Well, as Lori Anderson would put it, "?Que es mas macho?"
>I guess I'd pick door number 1 -- but only for the case where you view
>the page with browser navigation commands, not explicit links.

I agree 1. is best, but of course only for `history browsing'.  There
us a subtle point here, however: as the `history log' store b) I
talked about earlier cannot be infinite, the browser is sometimes
forced to do 3. to satisfy a history browsing effect (2. is not really
an option IMO).

The point is that the user never knows beforehand if 1. or 3. will be
done for an older item in the history list, and this is bad if the
item was the result of a non-idempotent POST operation (i.e. a form
submission that `did' something, like order a pizza).  If 3. is done
on such an item, this means reposting the form; and this means (unless
the form author is paranoid, and luckily many are) inadvertently
ordering a second pizza.  Thus, not having enough RAM in your computer
will be bad for your health :)

This is an important problem that, in my opinion, can only be solved
by putting extra stuff in the HTTP-spec.  A paranoid form author can
provide a 70% solution to this problem within the current HTTP-spec,
but nothing beyond that.

For a further discussion of this problem, see my article `HTTP and
statefull services' in the www-talk archive.

Shel Kaphan writes:
>I realize these considerations may have no role in the HTTP spec,

The more I think about it, the more I am convinced that these
considerations _do_ have a role in the HTTP spec:

1) parts of the solution to these problems involve HTTP extensions.

2) Also, as long as we only have the HTTP spec and the HTML spec to
specify the behavior of browsers, the HTTP spec is the most likely
place to solve this problem, even though the issue goes beyond data
transfer.

I've been meaning to submit some report/proposal to the http-wg
mailing list about this, but I have not yet had the time to write one.
If anyone wants to help putting together such a report, please mail
me.

>however I feel there are serious problems in this area, which can only
>be resolved by coordinating the behavior of browsers and servers.

I agree there are serious problems.  Besides browser and server
authors, CGI script authors is also involved.

I feel www-talk would be a good place to discuss these problems and
possible solutions.  I have tried to get a discussion going a number
of times, but so far, little has happened :(

[...]
>Another thing that might help: perhaps there should be a way for
>servers to "force" the URL (the *name*) handled by clients to something other
>than the requested URL.

I believe the redirection (3xx) codes in the HTTP spec could be used
(abused?) for this purpose.  

[...]
>To explain this a little more, if there were two GET requests, one for
>/cgi-bin/food/hamburgers and one for /cgi-bin/food/french-fries, which
>would result in a single page that ought to be cached as one page,
>then the server ought to be able to say, "you asked for
>/food/french-fries, but the page is called /food/generic-junk-food",
>and to have the browser use that info to uniquely identify a cache
>entry and update it with the newly fetched data.  This might not help
>to avoid fetching documents extra times, but it would help on cache
>coherence if the intent was to display a dynamically generated document.

I don't think this would help with cache coherence at all, for proper
definitions of `coherence'.  There is no reason for the `cache for
real browsing commands' ever to become incoherent (contain expired
page contents).  It seems to me you are proposing to automatically
update old versions of a page in the `history log'.  If new contents
for that URL are received, `history browsing' back would then display
the new, changed price of hamburgers I assume.

While such a scheme would be great for some applications, it should
not be the default, or it should at least be possible for _the service
author_ to switch off.  I can imagine plenty of cases where the user
wants to see _the old_ version of the page (e.g. the chess board 3
moves ago, the gold price 10 minutes ago), if at all possible.


Koen.
Received on Saturday, 27 May 1995 07:05:39 UTC