- From: Amos Jeffries <squid3@treenet.co.nz>
- Date: Fri, 10 Jul 2015 22:29:36 +1200
- To: ietf-http-wg@w3.org
On 10/07/2015 10:25 a.m., Ben Maurer wrote: > It is considered a best practice for websites to use "long caching" for > serving images, javascript and CSS. Sadly true. This myth is very common amongst web application developers and leads to a lot of broken websites and annoyance amongst customers. Speaking with my dual hats of Web Designer and Software Engineer developing caching proxy software; Best Practice is actually to put *appropriate* caching times and revalidation limits on objects regardless of original file type. What is appropriate depends on the predicted volatility of the individual objects. Object type is secondary to that volatility criterion. > In long cacing if a website has a > resource X.js which might change from time to time, rather than referencing > X.js and giving the endpoint a short expiration date, they reference > X-v1.js with a nearly infinite expiration date. I know where this started... An old tutorial written back in the late 1990's when HTTP/1.0 expiration was the only type of caching available and the first Browser War was in full swing. "Archaic" is the best word to describe it. It applies badly to HTTP/1.1 caching situations and can actually *reduce* cacheability of objects if applied indiscriminantly. Also introducing the possibility of nasty human errors via the manual version control system. There are a few edge cases where it applies well. But Best Practice it certainly is NOT in the current web environmant. > When X.js changes, the > website uploads X-v2.js and changes any references to use the new version. > This has the benefit that the browser never needs to revalidate resources > and that it sees changes instantly. [1] These days we have HTTP/1.1 revalidation. Where the object ETag is derived from either stored mtime value, a hash of the object, or both the HTTP/1.1 software out there today can take care of version control easily and fast without any manual assistance needed from the web dev or duplicated copies of things hanging around. > > At Facebook, we use this method to serve our static resources. However > we've noticed that despite our nearly infinite expiration dates we see > 10-20% of requests (depending on browser) for static resource being > conditional revalidation. We believe this happens because UAs perform > revalidation of requests if a user refreshes the page. Our statistics show > that about 2% of navigations to FB are reloads -- however these requests > cause a disproportionate amount of traffic to our static resources because > they are never served by the user's cache. That tells me that 10-20% of your traffic is probably coming from a HTTP/1.1 proxy cache. Whether it reveals itself as a proxy or not. Speaking for Squid, we limit caching time at 1 year**. After which objects get revalidated before use. Expires header in HTTP/1.1 only means that objects are stale and must be revalidated before next use. Proxy with existing content does that with a synthesized revalidation request even if the client that triggered it did a plain GET. Thereafter the proxy has a new Expires value to use*** until that itself expires. ** Plan is to extend that to 68 years as per new RFC7234 spec when the compliance updates get done in that area. *** Assuming the server actually did change the Expires header. There are a number that send back the same old already-expired value for years on end. > > A user who refreshes their Facebook page isn't looking for new versions of > our Javascript. Really they want updated content from our site. However UAs > refresh all subresoruces of a page when the user refreshes a web page. This > is designed to serve cases such as a weather site that says <img > src="/weather.php?zip=94025">. If that site had a 10 minute expiration on > the image, the user might be able to refresh the page and get more up to > date weather. > > 10-20% additional requests is a huge impact for a site's performance. When > a user presses refresh, they want to quickly see the latest updates on the > site they are on. Revalidating all resources on the site is a substantial > drain. In discussing this with Ilya Grigorik from the Chrome team, one > suggestion that came up was an explicit cache control option to tell UAs > not to revalidate. The proposed header would be Cache-Control: > dont-revalidate > > This header would tell a UA that the resource in question is unquestionably > valid until it's expiration time. That meaning is already provided by the Expires header and Cache-Control:max-age=N value. I dont see how adding yet another way to signal it will help with the software revalidating today. I do think I know where you are coming from with this, and kind of agree. The UA whose refresh button goes straight to reload-everything instead of efficient revalidate-only-as-needed behaviour is broken IMHO. However that is a UA bug, not a protocol problem. > A UA MUST NOT send a conditional request > for a resource with a dont-revalidate header prior to the resource's > expiration. Placing that criteria on objects will guarantee that your bandwidth bill becomes rather large when it need not. Why would you bother sending any Cache-Control at all in the case you wanted such a costly bill? "Just send a 200 reply with new object in response to revalidation requests." does the same thing and I hope you understand how that effects the efficiency of caching. > In most cases the UA SHOULD simply treat the resource as valid. This statement goes for all responses not actively labeled as already stale. Again no need for a new control signal. > If the UA is not willing to treat the resource as valid, it should send a > full request with no revalidation. As I mentioned above the revalidation may not have come from the UA. It may be requesting full complete copies of all objects on the page. A middleware cache may be revalidating its own content to optimize the cache->server connection bandwidth costs. > The UA MAY do this in cases where the > user has explicitly requested to use a clean cache -- for example a refresh > via ctrl+shift+r, or using developer tooling. Such functionality SHOULD be > targeted at advanced users rather than the average user. > > Without an additional header, web sites are unable to control UA's behavior > when the user uses the refresh button. UA's are rightfully hesitant in any > solution that alters the long standing semantics of the refresh button (for > example, not refreshing subresources). IMHO this control is unnecessary and an intrusion onto physical User expectations of their GUI behaviour. What we need is a standard behaviour from the button that works identically across all browsers. The Ctrl-r (revalidate-as-needed) and Ctrl+Shift+r / Ctrl-R (reload completely from scratch) is a good defacto standard already. It just needs to be consistently followed by all browsers and UA. The selective 'as-needed' part may be new right now [hint, hint] and some older UA needing bug fixes to realign, but thats code not protocol. Today I had to help a user with an app unfamiliar to both of us on an OS I'd never heard of before recover some accidentally deleted text (uh-oh). Of course per Murphys Law it was an urgent recovery with seconds literally ticking up an invoice cost. Trying Ctrl-Z on the off chance it worked ... one happy user. Consistent basic UA behaviour is good UA design. HTH Amos
Received on Friday, 10 July 2015 10:30:25 UTC