Re: dont-revalidate Cache-Control header from Amos Jeffries on 2015-07-10 (ietf-http-wg@w3.org from July to September 2015)

From: Amos Jeffries <squid3@treenet.co.nz>
Date: Fri, 10 Jul 2015 22:29:36 +1200
To: ietf-http-wg@w3.org
Message-ID: <559F9E90.4020801@treenet.co.nz>
On 10/07/2015 10:25 a.m., Ben Maurer wrote:
> It is considered a best practice for websites to use "long caching" for
> serving images, javascript and CSS.

Sadly true. This myth is very common amongst web application developers
and leads to a lot of broken websites and annoyance amongst customers.

Speaking with my dual hats of Web Designer and Software Engineer
developing caching proxy software;

Best Practice is actually to put *appropriate* caching times and
revalidation limits on objects regardless of original file type. What is
appropriate depends on the predicted volatility of the individual
objects. Object type is secondary to that volatility criterion.


> In long cacing if a website has a
> resource X.js which might change from time to time, rather than referencing
> X.js and giving the endpoint a short expiration date, they reference
> X-v1.js with a nearly infinite expiration date.

I know where this started... An old tutorial written back in the late
1990's when HTTP/1.0 expiration was the only type of caching available
and the first Browser War was in full swing. "Archaic" is the best word
to describe it.

It applies badly to HTTP/1.1 caching situations and can actually
*reduce* cacheability of objects if applied indiscriminantly. Also
introducing the possibility of nasty human errors via the manual version
control system.

There are a few edge cases where it applies well. But Best Practice it
certainly is NOT in the current web environmant.


> When X.js changes, the
> website uploads X-v2.js and changes any references to use the new version.
> This has the benefit that the browser never needs to revalidate resources
> and that it sees changes instantly. [1]

These days we have HTTP/1.1 revalidation. Where the object ETag is
derived from either stored mtime value, a hash of the object, or both
the HTTP/1.1 software out there today can take care of version control
easily and fast without any manual assistance needed from the web dev or
duplicated copies of things hanging around.


> 
> At Facebook, we use this method to serve our static resources. However
> we've noticed that despite our nearly infinite expiration dates we see
> 10-20% of requests (depending on browser) for static resource being
> conditional revalidation. We believe this happens because UAs perform
> revalidation of requests if a user refreshes the page. Our statistics show
> that about 2% of navigations to FB are reloads -- however these requests
> cause a disproportionate amount of traffic to our static resources because
> they are never served by the user's cache.

That tells me that 10-20% of your traffic is probably coming from a
HTTP/1.1 proxy cache. Whether it reveals itself as a proxy or not.

Speaking for Squid, we limit caching time at 1 year**. After which
objects get revalidated before use. Expires header in HTTP/1.1 only
means that objects are stale and must be revalidated before next use.
Proxy with existing content does that with a synthesized revalidation
request even if the client that triggered it did a plain GET. Thereafter
the proxy has a new Expires value to use*** until that itself expires.

** Plan is to extend that to 68 years as per new RFC7234 spec when the
compliance updates get done in that area.

*** Assuming the server actually did change the Expires header. There
are a number that send back the same old already-expired value for years
on end.


> 
> A user who refreshes their Facebook page isn't looking for new versions of
> our Javascript. Really they want updated content from our site. However UAs
> refresh all subresoruces of a page when the user refreshes a web page. This
> is designed to serve cases such as a weather site that says <img
> src="/weather.php?zip=94025">. If that site had a 10 minute expiration on
> the image, the user might be able to refresh the page and get more up to
> date weather.
> 
> 10-20% additional requests is a huge impact for a site's performance. When
> a user presses refresh, they want to quickly see the latest updates on the
> site they are on. Revalidating all resources on the site is a substantial
> drain. In discussing this with Ilya Grigorik from the Chrome team, one
> suggestion that came up was an explicit cache control option to tell UAs
> not to revalidate. The proposed header would be Cache-Control:
> dont-revalidate
> 
> This header would tell a UA that the resource in question is unquestionably
> valid until it's expiration time.

That meaning is already provided by the Expires header and
Cache-Control:max-age=N value.

I dont see how adding yet another way to signal it will help with the
software revalidating today.

I do think I know where you are coming from with this, and kind of
agree. The UA whose refresh button goes straight to reload-everything
instead of efficient revalidate-only-as-needed behaviour is broken IMHO.
However that is a UA bug, not a protocol problem.


> A UA MUST NOT send a conditional request
> for a resource with a dont-revalidate header prior to the resource's
> expiration.

Placing that criteria on objects will guarantee that your bandwidth bill
becomes rather large when it need not. Why would you bother sending any
Cache-Control at all in the case you wanted such a costly bill? "Just
send a 200 reply with new object in response to revalidation requests."
does the same thing and I hope you understand how that effects the
efficiency of caching.


> In most cases the UA SHOULD simply treat the resource as valid.

This statement goes for all responses not actively labeled as already
stale. Again no need for a new control signal.

> If the UA is not willing to treat the resource as valid, it should send a
> full request with no revalidation.

As I mentioned above the revalidation may not have come from the UA. It
may be requesting full complete copies of all objects on the page. A
middleware cache may be revalidating its own content to optimize the
cache->server connection bandwidth costs.

> The UA MAY do this in cases where the
> user has explicitly requested to use a clean cache -- for example a refresh
> via ctrl+shift+r, or using developer tooling. Such functionality SHOULD be
> targeted at advanced users rather than the average user.
> 
> Without an additional header, web sites are unable to control UA's behavior
> when the user uses the refresh button. UA's are rightfully hesitant in any
> solution that alters the long standing semantics of the refresh button (for
> example, not refreshing subresources).

IMHO this control is unnecessary and an intrusion onto physical User
expectations of their GUI behaviour. What we need is a standard
behaviour from the button that works identically across all browsers.

The Ctrl-r (revalidate-as-needed) and Ctrl+Shift+r / Ctrl-R (reload
completely from scratch) is a good defacto standard already. It just
needs to be consistently followed by all browsers and UA. The selective
'as-needed' part may be new right now [hint, hint] and some older UA
needing bug fixes to realign, but thats code not protocol.


Today I had to help a user with an app unfamiliar to both of us on an OS
I'd never heard of before recover some accidentally deleted text
(uh-oh). Of course per Murphys Law it was an urgent recovery with
seconds literally ticking up an invoice cost. Trying Ctrl-Z on the off
chance it worked ... one happy user.
Consistent basic UA behaviour is good UA design.


HTH
Amos
Received on Friday, 10 July 2015 10:30:25 UTC