- From: Amos Jeffries <squid3@treenet.co.nz>
- Date: Tue, 28 Jun 2016 00:41:47 +1200
- To: ietf-http-wg@w3.org
On 27/06/2016 9:37 a.m., Raphaël D wrote: > On Wed, 22 Jun 2016 13:39:39 +0200 Alcides Viamontes wrote: > >> Yes, this is related to caching in general. And it is the reason >> people have to add query strings for doing cache busting. This problem is a >> separate issue, but it interacts with cache digests in that old version >> of assets are kept in the cache and therefore in the cache digest and the >> origin have no way of removing it. The origin can only create a new URL >> (say, via a new query string) that gets added to the cache and the cache >> digest. > > (email subject renamed to avoid polluting the "Cache Digests for HTTP/2" > mailing-list thread) > > Hi, > > I read this incidentally meanwhile trying to understand what alternative > to cache-busting exists. > I'm stretching my head to find which overlooked IETF-http or W3C concept matches. > > > Nowadays most CMS, javascript/css-frameworks feature cache-busting > (usually using query-string) which is considered as the unavoidable > answer to the "can't forcefully refresh browser cache" issue (and web > traffic pattern). > > > Among the facts/reasons given are: > > 1) web-applications support the fact that some webserver set far-future > Expires times for assets (css, js, fonts) The existing Best Practice is for redesigns to change the names of their objects slightly *iff* the object has changed enough to warrant it. Enough to do cache busting without query-string, but also to retain backward compatibility with older versions of the application requesting old assets. IMO this whole nest of problems listed below is derived solely from developers not following _that_ best practice. Adding yet another Best Practice requirement for them to learn, remember and follow is not going to help the situation. > > 2) downstream proxies and browser cache are not accessible by the webserver > For developers following the #1 best practice this is a net positive. Those caches aid the backwards compatibility of the service. > 3) but web application needs to force a HTML page to use freshest version of > some or all assets even if they were already cached in-browser with a > far-future Expires time For developers following the Best Practice of #1, this does not occur therefore not a problem. Also assets which have not changed can utilize teh existing cached data. > > 4) people don't use Last-Modified or ETags because zero request > always seems better than requesting and waiting for a 304 response. > What is this "zero request" you speak of? Cache busting is about forcing a full response result. Which is by definition slower than the revalidation. And also much more detrimental to other wanted uses of the caching. As a cache implementer I am seeing a growing trend for use of anti-busting and de-duplication features being used in the intermediary web caches. Taken incautiously there is a train wreck ahead for all service developers, even those services doing the Best Practce from #1. "Seems" being the operative word of your description. Developer ignorance is not a good reason for requiring whole new mechanisms which they would have to learn instead of the existing ones which they still have not learned to use properly after 15+ years of those mechanisms existence. > 5) static files are usually not routed through CGI but left to webserver > configuration rather than web-application itself which is related to > assumptions like the above n°1 > When using the Best Practice for #1, this is not a problem. Assets can be located anywhere so long at the referring response referrs to the correct one for its need. That is a true fact regardless of cache busting. The use of cache-busting simply means the referring resource needs logic to track what the latest cache-buster value is. Which is no different than tracking that its needed resource still exists at the referene URI. > > "Is it a mistake to Expires +1 year /js/jquery.js"? seems one of the > underlying meta-questions (like the definitions of "persistence" and "version") > No it is not. Expires can be any date at all. The protocol allows for up to 68 years ahead! [Putting my web develoepr hat on ...] Use of the generic "jquery.js" name is suitable only for resource scripts which use the stable jQuery API features. The ones which are not going to change within that 1 year Expires period regardless of which jQuery version sits at that URI, nor whether the URL is fetched 364 days from now. The mistake the developer has made is that they are in fact *not* using stable jQuery API features. So what they need to be referencing is the versioned jquery-X.Y.Z.js name. The one matching the latest API version(s) their script is known to work with. The version number itself does the "cache busting" part cleanly without the query-string being involved. [ back to the cache maintainer hat ...] This mistake, naivety, ignorance is not something we can readily fix at the protocol or transport level. Though much respect to those who persist in trying to fix these social problems by technical means. > > In the event assets are not be cached that long (says only 1 week) and all > (reverse)proxies are correctly configured, the issue still arise under > some circumstance. With those anti- cache-busting features becoming ever more popular these cases will become more and more problematic. Not just for those using cache-busting to avoid doing a long standing Best Practice. But also for everyone mistakenly caught in the de-duplication efforts. > For example in case a webapp upgrades assets, it leaves inconsistencies > between the main resource fetched by the user-agent (the HTML page) and > some of its "dependencies". Since old, cached assets that can't be easily > invalidated, new resources are created in caches. > (query-string versioning appears a lot like a N.I.H. ETags mechanism > the difference being that one HTTP response (main webpage) sends the > ETag of the multiple sub-resources it depends upon) > Remember the reason you gave for people *not* using ETags like they should. Loss of reliability is the direct cost of choosing to throw away reliability for a "it seems to be faster". In the same way jumping off a cliff is faster to reach the bottom than attaching a rope and walking down it. (A rough analogy there, but far closer than it seems at first.) > > * Should cache-busting be part of HTTP best practices, what references and > knowledgeable voices have said/to say about it? IMO. No. > > * Is there HTTP 1.x or 2.0 alternatives? Is there a need for it ? or > should the solution be uniquely in the hands of assets > deployment/distribution/versioning tools and/or web application tweaks > like query-string suffixes? > Correct use of URIs and either renaming asserts when they change, or correct use of the ETag / Last-Modified equivalent if no rename is done. Note that uses of query-string to store a deterministic representation of ETag or Last-Modified header in the URI is not really "cache busting" as such. Those URI only change when the object does, so are more like enforcing revalidation when a bad actor in the delivery path does not do revalidate properly. This does run the risk of hitting some middleware admins naive de-duplication efforts though. > * Does the situation implies new precautions when using HTTP Expires header? > Not new ones IMHO. The same precautions exist both with and without cache busting. If those precautions are correctly taken, then cache-busting becomes less and less needed up to the point of being useless when all actors in the environment are behaving themselves. Sadly we may not quite be there today, but the last 5 or so years have seen a significant improvement around the web. I am hopeful that we are collectively getting close to the point of being able to simply tell the remaining bad actors to do it right instead of catering to the whinging. Amos
Received on Monday, 27 June 2016 12:42:31 UTC