Re: Transparency vs. Performance: survey of opinion from hallam@w3.org on 1996-02-26 (ietf-http-wg@w3.org from January to March 1996)

From: <hallam@w3.org>
Date: Mon, 26 Feb 96 12:48:59 -0500
To: Daniel LaLiberte <liberte@ncsa.uiuc.edu>
Cc: hallam@w3.org, http working group <http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com>
Message-Id: <9602261749.AA25158@zorch.w3.org>
I think we need to separate two scenarios here:-

1) A user surfs to CNN and downloads the index, reads an article, 
	then returns to the index with the [Back] button.

2) A user surfs to CNN, bookmarks CNN then retrieves the page from 
	the bookmarks file.

These are subtly different. In case 1 we might consider a model in which a 
browser opens up a new window for each URL visited. In theis intace pressing 
[back] simply raises the relevant window.

Now consider that we implement such a user interface in one window. In this 
model pressing [back] is simply selecting from a series of virtual windows.


In 2 there is a subtle difference. I download CNNs homepage to get the news 
headlines. I therefore want to get a new page by default. 


Much of this interaction is really up to the server to signal. Only the server 
can provisde information about the likely persistence of a page. Consider that 
if a page has a date within its URI then the binding between the URI and the 
page is likely to be permanent. IE if a server provides the sunday timnes for 
25th Feb. that page will not change. The CNN homepage will change however. 

Much of the problem with implementing such syste4ms lies in inadequate server 
design. Early server simply shipped out the filestore and consequently there was 
no way of knowing how long a page would last.

A subtler problem lies in the perception that URIs are simply methods of 
accessing a resource with attributes. Where a resource is accessible by multiple 
URis that view must be changed. For example if we have "todays sunday times" and 
"Sunday times for 25th Feb" they would on 25th feb point to the same resource. 
but if the resource was accessed through one URI the binding would be temporary, 
through the other it would be permanent.

In considering these problems I looked at Pierce's trichotemy of signs. This has 
the following counterpart:-

1) A resource has property firstness with respect to a parent if the
	semantics of all operations on the resource are exactly the same
	in all respects and exhibit the same behaviour as if carried out
	on the parent.

2) A resource has property secondness with respect to a parent if 
	an operation on the resource will exhibit the same behaviour as
	would have been shown by an operation on the parent a finite
	time earlier.

3) A resourcve has property thirdness with respect to a parent if
	an opertation on the resource will exhibit the same
	behaviour as would have been shown by the parent at a known
	time in the past.

[NB, the wording of these is imprecise, strictly speaking I should distinguish 
the sign and the resource properly but that is difficult to word.]

The three properties essentially define three separate classes of cache. The 
property firstness is expensive, it effectively requires us to either employ 
mirroring techniques or for the behaviour of the resource to be fixed.

The second property can be achieved through a simple notification proceedure, 
the third is what the existing CERN proxy server achieves.


I would propose a scheme in which a cache informs the broser the class of 
service which it provides. If a user is simply wanting to download news they are 
likely to be willing to bear a cache that is 2 minutes out of date provided 
there is a reasonable assurance that this will not be 2 hours or 2 days. If a 
user wishes to perform some kind of database transaction involving three 
resources at different sites the property of firstness is likely to be essential 
and justifiable. Otherwise the expense of maintaining mirroring is likely to be 
prohibitive.

If a cache is only providing third class service then the browser should know 
what it is dealing with. In some cases this will be OK. Note that in many 
circumstances a CERN type proxy may be caching dated information which allows it 
to effectively provide a first class service for those resources.


Perhaps we need to push part of this onto the client user interface. A client 
should inform the user what type of infoirmation has been delivered. It would be 
nice for a browser to display the time at which a cached object was last 
reported current (ie fetch date or date of last if-modified-since). If a page 
was returnd by a cache which recieves notification of updates within 2 minutes 
of them occuring then there would be a 2 minute flag.


On the reporting of forms issue, could we finesse the situation here by 
requiring a client signal the difference between a refresh and a re-submit 
somehow? This will be a live issue for payments systems. Perhaps the server 
could assist by flagging a page with a "do not use post to refresh this page" 
notice. 

I suspect however that POST submitted forms are going to have to filter out 
unintended resubmission in any case :-( This would not be difficult if only more 
servers were based on threads rather than a model of spawning a new process per 
connection.

		Phill
Received on Monday, 26 February 1996 10:09:08 UTC