Re: cache-busting and query-string versioning

On 27/06/2016 9:37 a.m., Raphaël D wrote:
> On Wed, 22 Jun 2016 13:39:39 +0200 Alcides Viamontes wrote:
> 
>> Yes, this is related to caching in general. And it is the reason
>> people have to add query strings for doing cache busting. This problem is a
>> separate issue, but it interacts with cache digests in that old version
>> of assets are kept in the cache and  therefore in the cache digest and the
>> origin have no way of removing it. The origin can only create a new URL
>> (say, via a new query string) that gets added to the cache and the cache
>> digest.
> 
> (email subject renamed to avoid polluting the "Cache Digests for HTTP/2"
>  mailing-list thread)
> 
> Hi,
> 
> I read this incidentally meanwhile trying to understand what alternative
> to cache-busting exists.
> I'm stretching my head to find which overlooked IETF-http or W3C concept matches.
> 
> 
> Nowadays most CMS, javascript/css-frameworks feature cache-busting
> (usually using query-string) which is considered as the unavoidable
> answer to the "can't forcefully refresh browser cache" issue (and web
> traffic pattern).
> 
> 
> Among the facts/reasons given are:
> 
> 1) web-applications support the fact that some webserver set far-future
>    Expires times for assets (css, js, fonts)

The existing Best Practice is for redesigns to change the names of their
objects slightly *iff* the object has changed enough to warrant it.
Enough to do cache busting without query-string, but also to retain
backward compatibility with older versions of the application requesting
old assets.

IMO this whole nest of problems listed below is derived solely from
developers not following _that_ best practice. Adding yet another Best
Practice requirement for them to learn, remember and follow is not going
to help the situation.


> 
> 2) downstream proxies and browser cache are not accessible by the webserver
> 

For developers following the #1 best practice this is a net positive.
Those caches aid the backwards compatibility of the service.


> 3) but web application needs to force a HTML page to use freshest version of
>    some or all assets even if they were already cached in-browser with a
>    far-future Expires time

For developers following the Best Practice of #1, this does not occur
therefore not a problem. Also assets which have not changed can utilize
teh existing cached data.

> 
> 4) people don't use Last-Modified or ETags because zero request
>    always seems better than requesting and waiting for a 304 response.
> 

What is this "zero request" you speak of?
 Cache busting is about forcing a full response result. Which is by
definition slower than the revalidation. And also much more detrimental
to other wanted uses of the caching.

As a cache implementer I am seeing a growing trend for use of
anti-busting and de-duplication features being used in the intermediary
web caches. Taken incautiously there is a train wreck ahead for all
service developers, even those services doing the Best Practce from #1.



"Seems" being the operative word of your description. Developer
ignorance is not a good reason for requiring whole new mechanisms which
they would have to learn instead of the existing ones which they still
have not learned to use properly after 15+ years of those mechanisms
existence.



> 5) static files are usually not routed through CGI but left to webserver
>    configuration rather than web-application itself which is related to
>    assumptions like the above n°1
> 

When using the Best Practice for #1, this is not a problem. Assets can
be located anywhere so long at the referring response referrs to the
correct one for its need. That is a true fact regardless of cache busting.

The use of cache-busting simply means the referring resource needs logic
to track what the latest cache-buster value is. Which is no different
than tracking that its needed resource still exists at the referene URI.


> 
> "Is it a mistake to Expires +1 year /js/jquery.js"? seems one of the
> underlying meta-questions (like the definitions of "persistence" and "version")
> 

No it is not. Expires can be any date at all. The protocol allows for up
to 68 years ahead!


[Putting my web develoepr hat on ...]

Use of the generic "jquery.js" name is suitable only for resource
scripts which use the stable jQuery API features. The ones which are not
going to change within that 1 year Expires period regardless of which
jQuery version sits at that URI, nor whether the URL is fetched 364 days
from now.

The mistake the developer has made is that they are in fact *not* using
stable jQuery API features. So what they need to be referencing is the
versioned jquery-X.Y.Z.js name. The one matching the latest API
version(s) their script is known to work with. The version number itself
does the "cache busting" part cleanly without the query-string being
involved.

[ back to the cache maintainer hat ...]

This mistake, naivety, ignorance is not something we can readily fix at
the protocol or transport level. Though much respect to those who
persist in trying to fix these social problems by technical means.


> 
> In the event assets are not be cached that long (says only 1 week) and all
> (reverse)proxies are correctly configured, the issue still arise under
> some circumstance.

With those anti- cache-busting features becoming ever more popular these
cases will become more and more problematic. Not just for those using
cache-busting to avoid doing a long standing Best Practice. But also for
everyone mistakenly caught in the de-duplication efforts.

> For example in case a webapp upgrades assets, it leaves inconsistencies
> between the main resource fetched by the user-agent (the HTML page) and
> some of its "dependencies". Since old, cached assets that can't be easily
> invalidated, new resources are created in caches.
> (query-string versioning appears a lot like a N.I.H. ETags mechanism
> the difference being that one HTTP response (main webpage) sends the
> ETag of the multiple sub-resources it depends upon)
> 

Remember the reason you gave for people *not* using ETags like they
should. Loss of reliability is the direct cost of choosing to throw away
reliability for a "it seems to be faster".

In the same way jumping off a cliff is faster to reach the bottom than
attaching a rope and walking down it.

(A rough analogy there, but far closer than it seems at first.)


> 
> * Should cache-busting be part of HTTP best practices, what references and
>   knowledgeable voices have said/to say about it?

IMO. No.

> 
> * Is there HTTP 1.x or 2.0 alternatives? Is there a need for it ? or
>   should the solution be uniquely in the hands of assets
>   deployment/distribution/versioning tools and/or web application tweaks
>   like query-string suffixes?
> 

Correct use of URIs and either renaming asserts when they change, or
correct use of the ETag / Last-Modified equivalent if no rename is done.

Note that uses of query-string to store a deterministic representation
of ETag or Last-Modified header in the URI is not really "cache busting"
as such. Those URI only change when the object does, so are more like
enforcing revalidation when a bad actor in the delivery path does not do
revalidate properly.
 This does run the risk of hitting some middleware admins naive
de-duplication efforts though.


> * Does the situation implies new precautions when using HTTP Expires header?
> 

Not new ones IMHO. The same precautions exist both with and without
cache busting.

If those precautions are correctly taken, then cache-busting becomes
less and less needed up to the point of being useless when all actors in
the environment are behaving themselves.

Sadly we may not quite be there today, but the last 5 or so years have
seen a significant improvement around the web. I am hopeful that we are
collectively getting close to the point of being able to simply tell the
remaining bad actors to do it right instead of catering to the whinging.


Amos

Received on Monday, 27 June 2016 12:42:31 UTC