Re: On Opaque validators from Jeffrey Mogul on 1996-01-09 (http-caching-historical@w3.org from January 1996)

From: Jeffrey Mogul <mogul@pa.dec.com>
Date: Mon, 08 Jan 96 16:01:03 PST
To: Lorenzo Vicisano <vicisano@iet.unipi.it>
Cc: http-caching@pa.dec.com
Message-Id: <9601090001.AA26648@acetes.pa.dec.com>
 (A)    Suppose to have a peer-entity caching mechanism (like the
	one present in Harvest). In such a scheme, a cache is allowed
	to look for the missing object in a pool of neighbor caches,
	by means of a multicast request, and to chose the best hit
	(if available). Then, if multiple (fresh) hits with
	different `Cache-validator' value are returned (*), the cache
	should have a way to chose the best hit (the newer object).

I can think of several interpretations of this scenario:

    (1) the origin server assigned a fresh-until time to the resource,
    but chose too long a time, and modified it before the time ran
    out.  This led to two caches having different "fresh" copies of the
    resource, and one of them isn't the "right" copy.  One could argue
    that this is a failure, but it's not a failure of the protocol,
    it's a failure of the server to predict the proper freshness
    lifetime.

    (2) the origin server has generated several different copies during
    the freshness lifetime of the first one, but the server doesn't
    care because (from its point of view) the various copies, while
    different, are all "good enough".  However, the cache would if
    possible like to return the latest copy to the user.  In other
    words, this isn't a case of "right vs. wrong", but a case of
    several different amounts of "rightness".  The protocol hasn't
    failed, and the freshness lifetime was reasonable, but there is
    room for optimization beyond that.

I could try to argue that if it's truly important that the cache
gets the newest copy of the resource, rather than simply getting
a "good enough" copy, then the fresh-until value would have been
set by the server to be small enough to distinguish between copies.
But I'll concede that there will be times when the server sets
the wrong value, and it would be nice to have an ordering on the
versions.

The question is whether this should be mandatory.  I.e., should
we require the server to send some means of ordering responses in
time, or should we simply make it possible but not require it?
And is there any need to make this part of the cache-validator,
or could it be done separately?  (If we do it separately, then
we don't have to make the cache-validator "partially" opaque.)

For example, most servers will send a Date: header (the draft 1.1
spec says "should always include".  This would do a reasonable
job of imposing an ordering on responses, without any additional
mechanism (provided that the server's clock is monotone).  The
one problem is that this does not provide a guaranteed ordering,
because HTTP-date values have 1-second resolution.  But I think
one could argue that if we are dealing with 1-second differences
that are truly important, then the server would be specifying
fresh-until values of zero.

There's another case:

    (3) the origin server didn't specify a fresh-until value (under my
    proposal, this would only be true if the this was a 1.0 server;
    under Roy's proposal, this would be the default for 1.1 servers).
    The neighbor caches have made heuristic guesses about the proper
    fresh-until times, but these could be wrong, and it would be best
    to choose the most recent one.

I think you could still use the Date: header to disambiguate these.

So I might add the rule that if a cache has a choice between
two cached responses for the same resource that are both fresh,
it should use the one with the later Date: header.  OK?
    
   (*) the only way to avoid that is to forbid a server to generate
    a new validator when still exists a fresh copy of the object with
    older validator.

I don't think we want to do this, because although the server cannot
force all of those other copies to disappear, it should certainly be
able to prevent new copies from being created with excessively short
lifetimes.  And we also have no way for the server to discover when
that last fresh copy has been given out, except if it simply does not
give out any more copies during the original freshness lifetime.

 (B) 1) cache-"a" and client-"b" have two different copies of the same
	object with `Cache-validator' Xa and Xb, and expiration time
	Ta and Tb respectively (client-"b", for some reason, fetched the
	object from a different source than cache-A).
     2) Suppose client-"b"'s copy being the newer one, but Tb<Ta (*).
     3) at time T (Tb<T<Ta) client-X issues a conditional GET to
	cache-"a", which in turn replies with its own copy of the object
	being it fresh and being Xa!=Xb.
   Note that, this way, client-"b" retrieves an older copy than the one
   it owns.

Again, this result is "correct" under a straightforward interpretation
of freshness; if the server didn't want the client X to see copy a
at time T, it would have given it a shorter fresh-until time.  But
I see what you are getting at here: you want to ensure that the client
doesn't see any resources appear to move backward in time, even if
both copies are still valid.

If the server really does care about this ordering property from
the point of view of a single client, but doesn't care if two
different clients see different versions at the same time, then
I suppose we would have to think about providing an ordering
mechanism.  Again, one could use the Date: header to do this,
within +/- one second.

The client could implement the rule "don't accept a non-firsthand
response that has a Date: older than the response you already have";
if it receives one of these, it could retry with a "Cache-control:
revalidate" to force a check with the origin-server.

If you think this is too much overhead, would it be sufficient
for the protocol to include a Cache-control: order-by-date
(sent by the origin server in a response) to force it to happen
for those few resources where it matters? 

But I think we are getting into the realm of reliable transactions (in
the distributed systems sense of the word "transaction", not in the
commercial sense), and it seems premature to start working too
hard on that.  When "we" (not me, I hope) starts to work on
things like serializability, it might be useful to introduce
version-ordering rules, but I don't think it's worth doing
that now, and especially it's not worth making it mandatory.

-Jeff
Received on Tuesday, 9 January 1996 00:08:24 UTC