- From: Daniel W. Connolly <connolly@beach.w3.org>
- Date: Thu, 24 Aug 1995 16:55:18 -0400
- To: msfriedm@us.oracle.com
- Cc: www-talk@w3.org
In message <9508241613.AA14397@bingster>, Mark S. Friedman writes: >Bob Wyman writes: > > On a slightly different tack... It seems like this whole business of cachin >g > > is getting a bit complicated... [...] it would be useful to put some > > effort into building at least an "informational" RFC or IETF-Draft on the > > subject of caching. Is someone already doing this? Would it make sense? If > > it isn't being done and it does make sense, I think I'll volunteer to try t >o > > fix this one... > >I certainly support this effort. I have yet to discover a source of >comprehensive caching info and it would be much appreciated. Quite. I have a message in my mh drafts folder called "HTTP Caching: rules and Heuristics" that I can't ever seem to finish and send out. I'm trying to come up with some Larch[1] traits to formally specify correct behaviour of information retrieval on the web. Here's a snapshot of some working notes: The idea is to capture the rules in Larch, ala: WebArch: trait includes PartialOrder(Time for T) introduces rep: URI, Entity, Time, Time -> Bool % for rep(r, e, t1, t2), read % "from t1 to t2, e is a representation of r" http: Request, Message -> Bool addr: Request -> URI date: Request -> Time date: Message -> Time ent: Message -> Entity lmod: Message -> Time exp: Message -> Time asserts forall r: URI, e: Entity, t1, t2, t3, t4: Time rep(r, e, t1, t4) /\ t1 <= t2 /\ t2 <= t3 /\ t3 <= t4 => rep(r, e, t2, t3) http(q, a) <=> rep(addr(q), ent(a), lmod(a), exp(a)) /\ lmod(a) <= date(q) /\ date(q) <= exp(a) Then consider a protocol interaction like: C1 -> P: GET http://S/path q1 P -> S: GET /path q2 S -> P: 200 ok a2 Date: tsresp Last-Modified: tmod Expires: texp rep('http://S/path', e, tmod, texp) P -> C1: 200 ok a1 Forwarded: tPresp Date: tsresp Last-Modified: tmod Expires: texp C2 -> P: GET http://S/path q3 Try to prove that the proxy can return ent(a2) to C2. 1 thru 6 are the assumptions, 7 is the proof goal: 1. http(q2, a2) % origin server says so 2. addr(q3) = addr(q2) % proxy requires this 3. date(q3) < exp(a2) % proxy checks this 3' date(q3) > lmod(a2) % proxy checks this 4. ent(a3) = ent(a2) % proxy builds reply matching a2 5. lmod(a3) = lmod(a2) 6. exp(a3) = exp(a2) 7. Show http(q3, a3) 8. rep(addr(q2), ent(a2), lmod(a2), exp(a2)) 1, http I 9. rep(addr(q3), ent(a3), lmod(a3), exp(a3)) 8, 2, 4, 5, 6 10. lmod(a3) <= date(q3) /\ date(q3) < exp(a3) 3, 3', 5, 6 11. http(q3, a3) 9, 10, defn http The trouble is that this keeps falling lower and lower on my to-do list. :-( Dan p.s. note that format negotiation isn't addressed above. I'm beginning to think that a server can't be bound to return "the best" representation of a resource -- it just won't scale. Rather, if a resource has multiple representations, it must return one, and notify the client that others are available. And rather than asking "does the proxy return exactly what the origin server would return?" we should ask "does the proxy return something that the origin server _could_ legally have returned?" The net effect is that in the case of: C1 prefers text to html, queries S for U via P, gets text D1 P caches U -> text D1 C2 prefers html to text, queries S for U via P, gets text D1 (because P had it cached) P is acting correctly, provided that S has told P that the html version is available (via URI: or Link: headers) and that P passes this info along to C1 and C2. If the client doesn't like what it gets, it can ask for the other version with "Pragma: no-cache" in a subsequent transaction. The idea is that the proxy would cache "the most popular" representation. The needs of the many... Another idea that struck me is that the lack of an expires header should be interpreted not as "it expires now" but "it expires sometime soon, and clients can guess heuristically." In other words, the onus is on the origin server to send out Expires: headers; else proxies/clients are free to cache for some time. The last idea I'll add here before I go is an answer to "what does the Expires: header mean?" In the case of HTTP, we're not so much concerned with when the data in the entity becomes uninteresting, like a part announcement or a 3-month sale. Rather, we're concerned with the binding between a URL and an entity. So in response to a query for U, an Expires header giving time t says "the enclosed entity is a representation of the resource identified by U until t." Anybody who wants to help write this document is welcome to step in! Dan
Received on Thursday, 24 August 1995 16:55:20 UTC