Re: these results sound very encouraging

Erik Aronesty writes:

| A significant number of hits for certain documents 
| could have been reduced if the your proxy had reported a 
| document-hash to the client in the header.

Yep!  But: not nearly as many as I'd expected.  What happened to all 
those Cindy Crawford pictures ? ;-)

My take on this was to hack some very limited support for the 
"Content-MD5:" header into the Apache and NCSA HTTP servers, since the 
proxy doesn't really want to have to go to the trouble of calculating 
this sort of thing itself ?  It already has quite a lot to do, and 
quickly!  Using MD5 (or whatever) to check that you got what you asked 
for would require an MD5 calculation on the part of the proxy for each 
URL retrieved, which is likely to be a no-no for all but the most 
anally-retentive ?  There was a discussion which led up to this, but as 
I recall it was split across a number of participants, private and 
public mail...

Unfortunately, my feet haven't really touched the ground very much 
lately, and I haven't had the opportunity to sit down with the code 
again and make it "production strength" - if you look at the sources 
you'll see that it ships disabled by default.  Phew!  The world is 
saved from my lame attempts at C programming :-)

I think the next step, and what's required to make this really work, is 
for the target HTTP servers themselves to generate and maintain a 
*cache* of checksums.  Being very lazy, I'm inclined to do this by 
putting them in a hash database.  A purpose-built in-memory cache would 
be faster, but feels like it would be quite painful to code up.

There are a few nasties, like locking strategies on the cache when you 
have a pool of servers, but it's doable and if I don't get around to 
doing the extra work I'm sure somebody else will (eventually).  A 
lazier-than-thou first step would be to have a separate process which 
went around generating the checksum cache periodically, so that the 
HTTP server itself doesn't need to be doing anything particularly 
clever.  Loosely consistent!

ObIETF:  Is "Content-MD5:" the right way to go about this ?  Should 
http-spec-v11-* note this use of MD5 ?  What about other algorithms ?

Martin

PS In case it's not obvious - the rationale is that over time the proxy 
can automagically "learn" about replicated resources.  So, you can take 
your URNs and stuff them up your...!

Received on Friday, 9 August 1996 23:57:54 UTC