Re: Can you cache the unknown? (was Re: Session-ID proposal) from Daniel W. Connolly on 1995-08-24 (www-talk@w3.org from July to August 1995)

From: Daniel W. Connolly <connolly@beach.w3.org>
Date: Thu, 24 Aug 1995 16:55:18 -0400
To: msfriedm@us.oracle.com
Cc: www-talk@w3.org
Message-Id: <199508242055.QAA04991@beach.w3.org>
In message <9508241613.AA14397@bingster>, Mark S. Friedman writes:
>Bob Wyman writes:
> > On a slightly different tack... It seems like this whole business of cachin
>g
> > is getting a bit complicated...  [...] it would be useful to put some
> > effort into building at least an "informational" RFC or IETF-Draft on the
> > subject of caching. Is someone already doing this? Would it make sense? If
> > it isn't being done and it does make sense, I think I'll volunteer to try t
>o
> > fix this one...
>
>I certainly support this effort. I have yet to discover a source of
>comprehensive caching info and it would be much appreciated.

Quite. I have a message in my mh drafts folder called "HTTP
Caching: rules and Heuristics" that I can't ever seem to finish
and send out.

I'm trying to come up with some Larch[1] traits to formally
specify correct behaviour of information retrieval on the web.

Here's a snapshot of some working notes:

The idea is to capture the rules in Larch, ala:


WebArch: trait

  includes
    PartialOrder(Time for T)

  introduces
    rep: URI, Entity, Time, Time -> Bool
         % for rep(r, e, t1, t2), read
         % "from t1 to t2, e is a representation of r"
    http: Request, Message -> Bool

    addr: Request -> URI
    date: Request -> Time

    date: Message -> Time
    ent:  Message -> Entity
    lmod: Message -> Time
    exp:  Message -> Time

  asserts
    forall r: URI, e: Entity, t1, t2, t3, t4: Time
      rep(r, e, t1, t4) /\ t1 <= t2 /\ t2 <= t3 /\ t3 <= t4 =>
             rep(r, e, t2, t3)

      http(q, a) <=> rep(addr(q), ent(a), lmod(a), exp(a))
                     /\ lmod(a) <= date(q) /\ date(q) <= exp(a)


Then consider a protocol interaction like:


C1 ->  P: GET http://S/path             q1
P  ->  S: GET /path                     q2
S  ->  P: 200 ok                        a2
          Date: tsresp
          Last-Modified: tmod
          Expires:       texp           rep('http://S/path', e, tmod, texp)
P  -> C1: 200 ok                        a1
          Forwarded: tPresp
          Date: tsresp
          Last-Modified: tmod
          Expires:       texp

C2 ->  P: GET http://S/path             q3


Try to prove that the proxy can return ent(a2) to C2.
1 thru 6 are the assumptions, 7 is the proof goal:


  1. http(q2, a2)                       % origin server says so
  2. addr(q3) = addr(q2)                % proxy requires this
  3. date(q3) < exp(a2)                 % proxy checks this
  3' date(q3) > lmod(a2)                % proxy checks this
  4. ent(a3) = ent(a2)                  % proxy builds reply matching a2
  5. lmod(a3) = lmod(a2)
  6. exp(a3) = exp(a2)
  7. Show http(q3, a3)
  8. rep(addr(q2), ent(a2), lmod(a2), exp(a2))  1, http I
  9. rep(addr(q3), ent(a3), lmod(a3), exp(a3))  8, 2, 4, 5, 6
 10. lmod(a3) <= date(q3) /\ date(q3) < exp(a3) 3, 3', 5, 6
 11. http(q3, a3)                               9, 10, defn http


The trouble is that this keeps falling lower and lower on my
to-do list. :-(

Dan

p.s. note that format negotiation isn't addressed above. I'm
beginning to think that a server can't be bound to return
"the best" representation of a resource -- it just won't
scale. Rather, if a resource has multiple representations,
it must return one, and notify the client that others are
available.

And rather than asking "does the proxy return 
exactly what the origin server would return?" we should ask
"does the proxy return something that the origin server
_could_ legally have returned?"

The net effect is that in the case of:
	C1 prefers text to html, queries S for U via P, gets text D1
		P caches U -> text D1
	C2 prefers html to text, queries S for U via P, gets text D1
		(because P had it cached)

P is acting correctly, provided that S has told P that the
html version is available (via URI: or Link: headers) and that P
passes this info along to C1 and C2. If the client doesn't like
what it gets, it can ask for the other version with "Pragma: no-cache"
in a subsequent transaction. The idea is that the proxy would
cache "the most popular" representation. The needs of the many...

Another idea that struck me is that the lack of an expires header
should be interpreted not as "it expires now" but "it expires
sometime soon, and clients can guess heuristically." In other
words, the onus is on the origin server to send out Expires:
headers; else proxies/clients are free to cache for some time.

The last idea I'll add here before I go is an answer to
"what does the Expires: header mean?" In the case of HTTP,
we're not so much concerned with when the data in the entity
becomes uninteresting, like a part announcement or a 3-month
sale. Rather, we're concerned with the binding between
a URL and an entity.

So in response to a query for U, an Expires header giving time t
says "the enclosed entity is a representation of the resource
identified by U until t."


Anybody who wants to help write this document is welcome to
step in!

Dan
Received on Thursday, 24 August 1995 16:55:20 UTC