Re: Ordered 'opqque' validators from Shel Kaphan on 1996-02-06 (http-caching-historical@w3.org from February 1996)

From: Shel Kaphan <sjk@amazon.com>
Date: Mon, 5 Feb 1996 21:20:53 -0800
To: Jeffrey Mogul <mogul@pa.dec.com>
Cc: HTTP Caching Subgroup <http-caching@pa.dec.com>
Message-Id: <199602060520.VAA08547@bert.amazon.com>
Jeffrey Mogul writes:

[ Shel is confused, etc. ... ]

 > 
 >     You are talking above about a different scenario than what we were
 >     considering.  We were talking about a situation in which a cache
 >     contains a single "best" response so far, and is presented by a
 >     client with a validator that does not correspond to the entry
 >     already in the cache.  In this case, although the entry in the
 >     cache may still be fresh and may in fact be the "current version",
 >     without additional state the cache cannot respond to a conditional
 >     request using the fresh entry.
 > 
 > Nonsense.  If the cache believes its own copy is fresh, then it
 > can return it to the client.  If the client's validator is the
 > same as the cache's validator, it should return "304 Not Modified"
 > but otherwise it must return the entire entity.
 > 

This is exactly where the problem is, as you point out below.  If the
requesting agent happened to already have seen a *newer* version of
the object than is present in the cache, then the simple equality test
on validators will cause the cache to return the *older* version, just
because it is still marked fresh (even though it has been superseded
at the origin server).  This is where the validator ordering
problem comes from, and this is why I said that to apply ordering
using dates they would need to be passed back in requests.


 > The only difficult case arises when a client (or cache) is getting
 > copies of the same resource from two different paths to the origin
 > server.  E.g., suppose that my browser is somehow able to use both
 > of these proxies:
 > 
 > 	proxy1.pa.dec.com
 > 	proxyB.pa.dec.com
 > 
 > Suppose that I round-robin between the proxies, and I keep
 > retrieving http://www.digital.com.  So I do this sequence:
 > 
 > 	GET http://www.digital.com
 > 		from proxy1
 > 					proxy1 GETs from www.digital.com
 > 						Validator = XX1
 > 					
 > [at this point, someone at the server modifies the file]
 > 
 > 	conditional GET http://www.digital.com
 > 		from proxyB
 > 	If-Valid: XX1
 > 					proxyB GETs from www.digital.com
 > 						Validator = XX2
 > 					
 > 	conditional GET http://www.digital.com
 > 		from proxy1
 > 	If-Valid: XX2
 > 
 > What should proxy1 do?  Well, it depends on whether the copy it
 > has is still fresh.  If the server assigned an over-optimistic
 > Expires: date, then it is still technically fresh and proxy1 can
 > hand it back to the client, even though this would appear somewhat
 > paradoxical at the client (hence the need to set honest Expires:
 > values!)
 > 

OK, you see the problem. Somewhat paradoxical indeed.  The user will see
the object revert to a previous version.  The discussion about
ordering validators is about solving this problem.  Just because the
protocol makes it legal, doesn't mean it's the right thing to do.

My position on this is that (a) it is ok for caches to behave like
this as far as the protocol is concerned, (b) caches can be programmed
to behave better than this a large percentage of the time by some
relatively simple state keeping, (c) we shouldn't extend the protocol
to cover this, and (d) the spec should point out the problem and at
least allude to possible ways of addressing it (which may include
remembering the Date: header for cases when a cache may pool resources
from other caches, and which also may include remembering validators
seen in the past so it can more intelligently decide when to forward a
conditional request (and yes, this is a paranoid algorithm)).

 > One could argue that this problem would go away if we simply used
 > If-Modified-Since: or some other totally ordered validator, but I
 > believe that the scenario here depends upon inconsistent treatment of
 > Expires: dates (why would the client have done the second GET unless it
 > thought the value it got from proxy1 was already stale?

I'm prefetching to when we talk about the probabilistic expiration
that we discussed at the meeting.  I think revalidating fresh objects
can happen without it being such an odd case.

 in which case
 > proxy1 would only return it in response to the third GET if there is a
 > serious clock-skew problem).  So if we keep the clock skews under
 > relatively good control (i.e., by using NTP and the Age:  header), I
 > don't think this case is likely to happen.
 > 
 > -Jeff

Here's another similar example that shows how this can happen without such
inconsistent use of Expires dates (except in the expectable way):

1:00:	Client A requests document X through cache C1.
	The document expires at 4:00.

2:00:  	Client B requests document X through cache C2.
	The document has been prematurely updated and now expires at
	3:00.  (It's a newsletter -- an unpredicted
	news event has suddenly changed
	the author's time frame for generating updates).

3:00	Client B conditionally gets X through C1.
	At this point C1 contains a different version of the document
	than client B has.  B's version is stale, and C1's version is fresh.

At this point, C1 can "legitimately" return its cached version, but B
will be disappointed. There is no clock skew involved, only silly
humans changing their minds about expiration dates.  Unlikely?  Maybe
so.

--Shel
Received on Tuesday, 6 February 1996 05:46:46 UTC