Re: cache keys from Shel Kaphan on 1996-01-07 (http-caching-historical@w3.org from January 1996)

From: Shel Kaphan <sjk@amazon.com>
Date: Sun, 7 Jan 1996 11:06:55 -0800
To: koen@win.tue.nl
Cc: dwm@shell.portal.com, http-caching@pa.dec.com
Message-Id: <199601071906.LAA09176@bert.amazon.com>
Koen says:
	...
  I agree with Dave that the method must be part of the cache key.

Well, if that's generally what people think, then we can dispense with
the no-side-effects thing, which controls how *different* methods use
the same cache entry.  But I'm pretty surprised if it is what people
think, since it seems so fundamental to a correct cache design to me.
On the other hand, the no-side-effects business may be confusing
enough that people would do just about anything to avoid it.

  >I do that all the time (use the same URI for POSTs and GETs), just to
  >get around cache problems when different kinds of requests need to
  >return new versions of the same object. 

  I also do it all the time, mainly to allow reload buttons on some
  browsers to work as expected.  But my POSTs will not necessarily yield
  the same result as subsequent GETs on the URI, they can also give an
  error message in a 200 response, leaving the content bound to the GET
  unchanged.

That's what Location at least *could* be used for.


  > Saves a redirection, which in
  >today's world can't be trusted to contact the origin server on the
  >second request anyway.

  Redirection also cannot be trusted?  Ack!  I wonder what you can trust
  these days.  Can you give an example of a client or proxy that caches
  redirects?

I can't remember which ones I had trouble with, but I remember having
to abandon a redirection-based way to control this based on some
system or other illicitly caching the results of the redirection
target URI.

  >I certainly want to have the ability to follow a POST with a later GET and
  >get what the POST returned.

  You have it now, by making the GET response always expire immediately.
  Modulo browsers that don't care about Expires, of course.

Yes.


  >  In addition, I want the response from that
  >POST to make it impossible for me to receive the previous version of
  >that object that may have been already in the cache, when I do a later GET.

  That is a valid thing to want, but you cannot get it by throwing the
  request method out of the cache key.  Too many things would break.

  >To me, insisting that each method have its own cache slot for a given
  >URI would be analogous to designing a computer cache where LOADs and
  >STOREs didn't share cache slots for the same memory locations.

  Many POSTs do not act as STOREs.

No, they're more like ADD-TO-MEMORY, to use the same metaphor, but what
about PUTs?  Are you saying that when PUT becomes more popular, that
caches should separately cache PUTs from GETs on the same URI???


  The 1.1 draft already provides `see other redirection' for POSTSs that
  do act as a STOREs.  If implementations of this are broken, they need
  to be fixed.  Speccing an new alternative scheme, and hoping that
  implementations of that new scheme will be less broken, holds little
  promise as a fix.

Well, though I like being able to control the method on a redirection,
I have never much liked the requirement for a second round trip to
do this.  Also, I don't believe the spec says anything about
being able to control "forced reloading" on the redirection request,
which I would view as a requirement.  (Does it? I don't have it in
front of me).  Instead, I think all involved objects must be marked as
never fresh.

  If you want to propose some alternative scheme to `see other
  redirection', the only possible justification can be that this
  alternative scheme is more efficient, for example because it avoids
  the conditional GETs on every request that are needed in the `see
  other' method.

It is one RTT more efficient,  and doesn't require that objects
that may be involved in this always be marked as "not fresh".
But yes, redirection can accomplish the same outward results.

  [From here on, I am speculating on how to improve on `see other']

  Improving on `see other redirection' can be tricky.  A scheme that
  lets POST responses influence cached responses of earlier GETs can
  only work as long as all GETs and POSTs travel through the same cache.

This problem affects anything we say about cache coherency, in the
absence of a revocation protocol.


  If the user agent sends POSTs directly to the origin server, and GETs
  though a proxy cache, then the proxy cache has no chance of
  invalidating the GET response.  And didn't AOL use a scheme in which
  their browsers randomly access one of several proxies for subsequent
  requests?

I believe so.  This may contribute to why AOL is among the more
difficult systems to make an interactive WWW service work through.

[ purely speculative flaming here, but... ]: I would claim that if
someone is going to run non-communicating caches in a round-robin
fashion like this, then in order to really be correct, the caches
themselves should be using different algorithms that would inevitably
make their caching less effective.  Or perhaps, that there should be
some way for such caches to communicate their nature to origin servers
so that the servers could be more conservative about cachability of
responses.

  The only thing we can really require as far as request routing is
  concerned, is that if a 1.1 browser has an internal cache, then all
  GETs and POSTs must go through that cache.  So I get to the following
  design:

Well, the design point I have been using is that since we have no
revocation protocol, the only thing we can control is the behavior of
an individual cache.  I have been assuming that most of the time, the
arrangement of caches between a client and a given server will be
fairly constant.  If not, we can't control what happens.  But we can
do something about consistency and behavior of an individual cache,
and so we probably should.  Someday maybe there will be a revocation
protocol, and then it would look bad if single caches couldn't even
maintain coherency.

  There must be some way to say

   Cache-control: max-age-for-browser-caches=X,
		  proxy-caches-must-always-do-conditional-GETs

  in 1.1 responses.  (We can already say something slightly less
  efficient: Cache-control: max-age=X, private.)  A server wanting to
  use the `POST response replaces old GET responses that are not stale
  yet' mechanism on an URI U must send this Cache-control information in
  every GET response on U.  Further, we require from browsers that

   If a 1.1 internal browser cache has stored a GET response GR on URI
   U, and it relays a POST response PR from URI U containing the response
   header Location: U ,
   then the cache must either invalidate the old GET response GR or
   (highly preferred) replace it with the POST response PR.

  Shel, would this be acceptable to you?

I'd rather convince you of the design point above than start getting
into browser/proxy differences yet, which I don't like too much.

  I for one would like to have this
  kind of behavior, it would allow my own web software to be a bit more
  cache-friendly.  Spoofing problems would be virtually absent in the
  above scheme.

  But I wonder if this scheme isn't too complicated.  If we spec all
  this, what are the chances that everybody will get it right?

pretty small.

  >--Shel

  Koen.

--Shel
Received on Sunday, 7 January 1996 19:35:06 UTC