Re: caching dilemma from James Gwertzman on 1995-05-29 (www-talk@w3.org from May to June 1995)

From: James Gwertzman <gwertzma@eecs.harvard.edu>
Date: Mon, 29 May 1995 17:38:55 -0400
To: sjk@amazon.com
Cc: www-talk@www10.w3.org
Message-Id: <199505292138.RAA02912@maddog>
Hi. Let me respond to your points one by one.

>>>>> "Shel" == Shel Kaphan <sjk@amazon.com> writes:

    Shel> Hi,


    Shel> The "expires" feature should cover the issue of when pages
    Shel> should be flushed, but the world is apparently not ready for
    Shel> it, because:

    Shel> - If you set documents to expire immediately, some major
    Shel> browsers display "Data Missing" or equivalently scary
    Shel> messages when you use browser commands to "back up" to that
    Shel> page.  Since many users are not going to understand what is
    Shel> going on and will be confused by such messages, and may not
    Shel> know to "reload" the page at that point, it would be better
    Shel> for them never to see messages like that.  (I've already had
    Shel> problems with some naive beta testers tripping over that.
    Shel> They tend to think something must have broken.  You can't
    Shel> argue that we need more sophisticated users, because we
    Shel> don't have a choice!)

    Shel> - Some browsers (such as Prodigy's) appear to ignore the
    Shel> "expires" header and cache pages anyway.  (and that's just
    Shel> their *browser*...)


In my mind the expires field should ONLY be used for documents with a
fixed lifetime. Cool-site-of-the-day for example, or dynamic pages
which expire immediately. I agree that browsers should do a better job
with pages that expire immediatly; namely showing them but not caching
them. I believe that for all other items (with undetermined lifetimes)
thath the browsers should use the technique that I describe in the
chapter of my thesis labeled "Cache consistency" that is based on the
Alex FTP cache. Namely, the older a page is the less likely that the
page will change. when the browser suspects that the page might have
changed it sends the "get-if-changed-since" message to the server to
find out whether its cached replica needs to be updated. If the answer
is "yes" then it updates the page before showing it to the
user. Otherwise it simply uses the page currently cached.

the Browser decides when to check by using ratio of the time since the
file was last checked to the age of the file (time since file was
created). Whenever this ratio exceeds some threshold, ie 10%, the file
is checked. In other words, if the file is a month old, and it was
last checked an hour ago, don't bother checking again before using the
cached copy. If the file was created a month ago, and last checked a
week ago, then contact the server before showing the user the cached
file. I describe simulations in my thesis that show this to be a
promising approach.


    Shel> So, I have a question and I have suggestions.

    Shel> First, the question:

    Shel> Is there any good workaround for the current problem, that
    Shel> would have the properties of: - forcing browsers to reload
    Shel> expired pages when someone explicitly requests one, and -
    Shel> either: - allowing pages on the browser's history stack (for
    Shel> instance) to remain in the local cache even if they are
    Shel> expired, or, - *somehow* causing the browsers to gracefully
    Shel> and silently reload expired pages when re-visited through
    Shel> history mechanisms.

    Shel> No?  I suspected as much...


You're right, my stuff does not address the "here and now" very
well. I'm describing a solution to caching on local-area-networks, not
necessarily clients and their history stacks.

    Shel> The suggestions:

    Shel> To make the web work more smoothly, it would be nice if
    Shel> browsers would handle this situation more gracefully, by,
    Shel> for instance, not displaying errors like "Data Missing", but
    Shel> just automatically reloading the page.

    Shel> However, I also think it is worth considering for browser
    Shel> writers that history stacks (that can be re-viewed with
    Shel> browser navigation controls) are in a class of their own
    Shel> when it comes to caching.  However, while it might make
    Shel> sense to back up and see an expired document, since history
    Shel> mechanisms are for "history", it does not make sense to go
    Shel> through a link and see a cached copy of an expired document.
    Shel> It is REALLY BAD for browsers to display cached copies of
    Shel> expired documents when they are meant to be freshly
    Shel> displayed in response to a direct user command, because a
    Shel> URL may be a request to a program that is displaying dynamic
    Shel> information related to the user's extended "session" with
    Shel> the server.  (This is the core of the issue).

    Shel> I realize these considerations may have no role in the HTTP
    Shel> spec, however I feel there are serious problems in this
    Shel> area, which can only be resolved by coordinating the
    Shel> behavior of browsers and servers.

    Shel> Another thing that might help: perhaps there should be a way
    Shel> for servers to "force" the URL (the *name*) handled by
    Shel> clients to something other than the requested URL.  This
    Shel> would allow, for example, the requestor's URL to be used to
    Shel> encode information relating to a query, but would then
    Shel> result in a single cache entry in the client.

    Shel> To explain this a little more, if there were two GET
    Shel> requests, one for /cgi-bin/food/hamburgers and one for
    Shel> /cgi-bin/food/french-fries, which would result in a single
    Shel> page that ought to be cached as one page, then the server
    Shel> ought to be able to say, "you asked for /food/french-fries,
    Shel> but the page is called /food/generic-junk-food", and to have
    Shel> the browser use that info to uniquely identify a cache entry
    Shel> and update it with the newly fetched data.  This might not
    Shel> help to avoid fetching documents extra times, but it would
    Shel> help on cache coherence if the intent was to display a
    Shel> dynamically generated document.

I agree here. There is already a redirection mechanism in place, but
I don't think the results of the redirection are cached across
sessions. I would love it if the user could ask for page a on machine
b, and be told that page a now lives on machine c, and remember that
fact until told otherwise. after all, a redirection like this only
takes 30 or 40 bytes, and the typical client could store thousands of
them very neatly.

    Shel> Anyway, just some thoughts.  If you have any ideas, pointers
    Shel> or references for me, I would really appreciate it.

    Shel> --Shel Kaphan sjk@amazon.com
Received on Monday, 29 May 1995 17:57:20 UTC