Re: Heuristic caching without validators from Adrien W. de Croy on 2013-01-17 (ietf-http-wg@w3.org from January to March 2013)

From: Adrien W. de Croy <adrien@qbik.com>
Date: Thu, 17 Jan 2013 01:23:22 +0000
To: "Mark Nottingham" <mnot@mnot.net>
Cc: "HTTP Working Group" <ietf-http-wg@w3.org>
Message-Id: <emfe6950e6-f177-4326-b1b0-b2c6bd7b1d4a@bombed>

Hi Mark

------ Original Message ------
From: "Mark Nottingham" <mnot@mnot.net>
>
>On 17/01/2013, at 11:12 AM, Adrien W. de Croy <adrien@qbik.com> wrote:
>
>>  Hi all
>>
>>  p6 and RFC 2616 when talking about heuristic caching, place a should 
>>level limit on heuristic freshness (10% of time elapsed since L-M), 
>>but only when there is a Last-Modified header.
>>
>>  However, it has been noted that some caches will store and reuse 
>>responses that have no validators at all.
>>
>>  Obviously it only makes sense to cache something if it can be 
>>re-used, and without validators, it can't be re-validated with the 
>>O-S, and therefore the only way such a resource can be re-used is if 
>>the cache makes some assumption about freshness.
>>
>>  In previous versions, we had heuristics for minimum effective 
>>freshness based on content type. This caused all manner of problems, 
>>so we dropped it for our current version, however we're finding 
>>relatively poor cachability of the internet, and so the value of 
>>caching is proving to be limited.
>>
>>  Since we can't wait forever for web site operators to consider and 
>>roll out reasonable caching directives, I believe in order to provide 
>>some real benefit from caching, we need to take a much more aggressive 
>>stance. This will require at least heuristic caching, and I'm fairly 
>>certain also heuristic caching of responses that don't have any 
>>validators.
>>
>>  Are there any guidelines for this? Since we're a shared intermediary 
>>cache, we can't do things like cache for the duration of the browser 
>>session, so it's going to come down to a table of min freshness per 
>>content-type (where there are no validators).
>>
>>  Does this issue deserve any discussion in the RFC / p6? It's very 
>>light on heuristic freshness - probably for a reason.
>
>It's already done by some widely deployed caches - see:
>   http://www.mnot.net/blog/2009/02/24/unintended_caching
>
I read this about an hour before posting.  Very interesting.

>The original intent was to leave this open, AIUI, precisely because 
>most content on the Web doesn't provide any freshness information.
There's something a bit disturbing about that.

If most web content doesn't provide freshness information (which 
actually is what we also see), then heuristic freshness calculations are 
arguably the most important part of caching.

Having the most important part left unspecified and open seems like a 
bit of a problem for interop.

Where can a web author or cache implementor go to see what to expect the 
behaviour would be?  Maybe we need a BCP?

Regards

Adrien

>
>Cheers,
>
>--
>Mark Nottingham http://www.mnot.net/
>
>
>
>

Received on Thursday, 17 January 2013 01:24:09 UTC