W3C home > Mailing lists > Public > ietf-http-wg@w3.org > October to December 2009

Re: cache freshness / age calcs

From: Mark Nottingham <mnot@yahoo-inc.com>
Date: Mon, 12 Oct 2009 15:28:16 +1100
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-Id: <0BD82EFC-9F0D-4122-89FE-E245E32AE22C@yahoo-inc.com>
To: Adrien de Croy <adrien@qbik.com>

I've been meaning to raise an issue along these lines recently as  
well, so your message is very well-timed.

WRT "well-synchronised" -- A lot of the text in 2616's caching section  
-- especially around age calculation -- was explanatory, not spec  
text. Most of it has been removed in p6-07, but some still remains,  
and my take is that this is one example of this.

As you've noticed, the algorithm is very conservative; i.e., it will  
always err on the side of considering something older than it actually  
is. This is annoying when the freshness lifetime is short (or the skew  
very large), as you point out making things that could have been  
cacheable uncachable.

Since transit time is already accounted for here (related issue: <http://tools.ietf.org/wg/httpbis/trac/ticket/29 
 >), it seems like the wild card that you have to deal with is HTTP/ 
1.0 caches not emitting Age headers.

Just having a 1.1 origin isn't necessarily good enough, because there  
could (in theory) be a 1.0 cache interposed somewhere along the way;  
remember, they aren't required to emit Age nor Via, and while the next  
hop towards the UA *should* record it in the Via header (presuming  
it's 1.1), as we know not everyone sends them.

I'm wondering if it's good enough to specify that if:
    - your next hop is a proxy AND it sends a Via header that's all  
1.1, OR
    - your next hop is the origin, and if the Via header is present  
it's all 1.1
you can calculate age using the age header without trying to account  
for hidden 1.0 caches in the chain (using Date).

This does have the potential to mess up in a few circumstances, but  
AFAIK 1.0 caches will produce Age anyway; e.g. Squid. Most of the  
other caches deployed are going to be either accelerators/CDNs (which  
already do unholy things with Date and Age; see Edith Cohen's paper  
from a while back), or interception caches, which are responsible for  
any problems they cause anyway.

Thoughts? An alternative would be to reduce the Date portion of the  
calculation to a SHOULD-level requirement.

On 12/10/2009, at 2:12 PM, Adrien de Croy wrote:

> Hi all
> I've been poring through draft-ietf-httpbis-p6-cache-07 trying to  
> figure out what to do in a particular case.
> This case is where the origin server has a clock that is way out of  
> whack, and specifies expiry times close to the level of error.  No  
> Age header, so presumably not from some intermediary cache.
> The documentation for calculation of the apparent age states:
> "A response's age can be calculated in two entirely independent ways:
>  1.  now minus date_value, if the local clock is reasonably well
>      synchronized to the origin server's clock.  If the result is
>      negative, the result is replaced by zero.
>  2.  age_value, if all of the caches along the response path implement
>      HTTP/1.1."
> The obvious question being what if the clocks are not reasonably  
> well synchronised, and 2 doesn't hold either (either not all HTTP/ 
> 1.1 or no Age header)?  How can you even tell if a clock is  
> synchronised or not?  In that case does the spec not attempt to  
> specify how to calculate age?
> In my problem case, using now - date_value places a huge skew on the  
> time at which the stored response can no longer be considered  
> fresh.  It significantly reduces the effectiveness of the cache.   
> Obviously the resolution is to get the server admin to fix their  
> clock, but that's an uphill battle.
> I would presume that if you get a response with a Date header, and  
> no Age header from a 1.1 O-S, then you should presume that the Date  
> header is an indication of the local clock at that server (ignoring  
> RTT and time to generate).  If there is an age header, you should  
> consider the Date header to be some date in the past from which the  
> Age was subsequently calculated (e.g. caches don't update Date  
> headers to their own value when serving from cache - or do they?).    
> Is this why the age is calculated as the larger of the received age  
> value vs the apparent age rather than the sum of the 2?
> If a response didn't come from a cache, it cannot have been  
> generated before it was requested.  Therefore calculating an  
> apparent age is misguided if there is no age header.  A conservative  
> view of the age of a response not from cache should therefore be  
> bounded by the age of the request, rather than the difference in  
> clocks (which can be large).
> What do browsers commonly do in this case?
> Regards
> Adrien
> -- 
> Adrien de Croy - WinGate Proxy Server - http://www.wingate.com

Mark Nottingham       mnot@yahoo-inc.com
Received on Monday, 12 October 2009 04:29:52 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 11:10:52 UTC