- From: <huw@dcs.gla.ac.uk>
- Date: Sat, 16 Sep 95 18:10:01 BST
- To: www-proxy@www0.cern.ch
- Cc: www-lib@www0.cern.ch, huw@dcs.gla.ac.uk
At the moment it is not possible for clients to be sure that the cached page they are looking at has been updated since the copy was cached. This e-mail proposes a protocol for cache-coherency of Web pages. I have cc'd this to www-lib as I'm not sure if www-proxy is still a valid list. Huw Evans Research Assistant Department of Computing Science Glasgow University Glasgow Scotland My proposal is for the client to query the server to see if the page has been changed since it was last retrieved. When the page is first retrieved it is tagged with the time at the originating server (called St). This information is stored along with the page at the local cache server. When anybody attempts to get the file at a site which holds a cached copy a query is made to the originating server asking whether the page has been updated since St. This has the advantage that the times that are compared are both local to the originating server so skew is not an issue (if the system clock has been put back, that's up to them and we can do nothing about that). The query and the reply are extremely small messages and the time comparison is trivial. Doing this Across the Internet ------------------------------ If the page has not been updated, an appropriate message is sent back and the cached copy is used. If the page has been updated a new copy (with the new St) is sent back and is cached locally. I would assume that pages are changed relatively infrequently, thus the query will, the majority of the time, come back Control-cache: use-cached-copy. All of this has to be done in the face of errors. The query takes place across the Internet, and a reply may not be possible because, for example, there is no route to the server or the server is not there or it may take too long to receive a reply because, for example, the network and/or server may be really slow. When no reply is forthcoming within a certain period of time the server is treated as unreachable and the local copy must be used. The user should be informed in a window however as they need be aware they are using, potentially, old data. Another reason for there being no document is that it has moved or been removed. If the document has been moved, the new location of the document has to be contacted to see if the document has been changed and the above should be executed again, possibly going to another location. If the document has been removed the user should be informed as they are looking at a cached copy of data the author has removed for some reason and they should be aware of this. The issue of finding a document that has moved should be treated in a separate discussion as it is a major piece of work in its own right. Composite Pages --------------- As pages are made up of a number of different underlying files (eg. html, gifs, audio samples) a page is deemed to have changed if any one of its constituent parts has changed. It is a challenge to try to send only a minimal amount of data. For example, a page may consist of some html, a gif and an audio sample. If only the gif has been changed, ideally, only the gif should be sent. The minimum may not be possible for all pages, but it may reduce the amount of data that has to be served on average. I would assume that html changes more frequently than any other constituent part of a page which is favourable as html is ascii which is fast to transfer.
Received on Saturday, 16 September 1995 13:11:45 UTC