W3C home > Mailing lists > Public > www-archive@w3.org > April 2006

trying to cache wikipedia pages

From: Dan Connolly <connolly@w3.org>
Date: Wed, 19 Apr 2006 11:03:40 -0500
To: httplib2-discuss@lists.sourceforge.net
Cc: www-archive@w3.org
Message-Id: <1145462621.27608.504.camel@dirk.w3.org>

I'm writing a little thingy to get airport lat/long info
out of wikipedia. Wikipedia suffered an outage today,
so I'm working on caching and offline access.

I re-dicovered http://bitworking.org/projects/httplib2/ .
I integratet that into my little aptdata.py thingy;
it seems to work. Then I try again, expecting the program
to work out of the local disk cache. Nope. So I
add max-age=3600 to my requests... still no joy...

I look in the cache, and... no wonder:

cache-control: private, s-maxage=0, max-age=0, must-revalidate

That seems like a "don't bother to help me with my load;
just melt down my servers, please" caching policy. Grumble.

At first I thought setting max-age in a request would
override the server; but I see:

            freshness_lifetime = min(freshness_lifetime,
int(cc['max-age']))

I thought maybe that should be max, but then I read up...

"If both the new request and the cached entry include "max-age"
directives, then the lesser of the two values is used for determining
the freshness of the cached entry for that request."
 -- http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.3

Then I see max-stale is what I want, but then in httplib2.py, I see:

    We will never return a stale document as 
    fresh as a design decision, and thus the non-implementation 
    of 'max-stale'.

So I'm getting no help from either side. Sigh.

If I implement max-stale, any chance you'll reconsider that
design decision? Or will I have to maintain a fork?

Any suggestions on getting wikipedia to change their caching
policy? Seems to me that no cache-control header at all
is The Right Thing for them, no?


-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/
D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E
Received on Wednesday, 19 April 2006 16:03:48 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 7 November 2012 14:17:56 GMT