Re: dont-revalidate Cache-Control header from Ben Maurer on 2015-07-10 (ietf-http-wg@w3.org from July to September 2015)

From: Ben Maurer <ben.maurer@gmail.com>
Date: Fri, 10 Jul 2015 10:14:26 -0700
To: Amos Jeffries <squid3@treenet.co.nz>
Cc: ietf-http-wg@w3.org
Message-ID: <CABgOVaLG6QZyjqk2AGYupShST_u3ty9BpxUcPX+_yMEC1hyHAQ@mail.gmail.com>
On Fri, Jul 10, 2015 at 3:29 AM, Amos Jeffries <squid3@treenet.co.nz> wrote:

> On 10/07/2015 10:25 a.m., Ben Maurer wrote:

> In long cacing if a website has a
> > resource X.js which might change from time to time, rather than
> referencing
> > X.js and giving the endpoint a short expiration date, they reference
> > X-v1.js with a nearly infinite expiration date.
>
> I know where this started... An old tutorial written back in the late
> 1990's when HTTP/1.0 expiration was the only type of caching available
> and the first Browser War was in full swing. "Archaic" is the best word
> to describe it.
>
> It applies badly to HTTP/1.1 caching situations and can actually
> *reduce* cacheability of objects if applied indiscriminantly. Also
> introducing the possibility of nasty human errors via the manual version
> control system.
>

Nobody is recommending people do this manually, just as we don't recommend
people write javascript without whitespace. A build process should take
resources such as JS and CSS, minify them and create uniquely named versions


> There are a few edge cases where it applies well. But Best Practice it
> certainly is NOT in the current web environmant.


This practice seems to be one that is widely recommended -- eg the Google
guide in my original email, Steve Souders's book
https://books.google.com/books?id=jRVlgNDOr60C&lpg=PA23&ots=pbw_DA5ce0&dq=cache%20expiration%20steve%20souders&pg=PA27#v=onepage&q=cache%20expiration%20steve%20souders&f=false.


At Facebook we've found that this technique dramatically increases user
performance. A number of other large websites also deploy it.


> > When X.js changes, the
> > website uploads X-v2.js and changes any references to use the new
> version.
> > This has the benefit that the browser never needs to revalidate resources
> > and that it sees changes instantly. [1]
>
> These days we have HTTP/1.1 revalidation. Where the object ETag is
> derived from either stored mtime value, a hash of the object, or both
> the HTTP/1.1 software out there today can take care of version control
> easily and fast without any manual assistance needed from the web dev or
> duplicated copies of things hanging around.


ETag validation is exactly what we want to avoid. Validating an etag still
requires a round trip from the client to the server to revalidate the tag.
When we serve a resource X.js on our page, if the user has the current
version of X.js in their cache we want them to use it without a round trip
to the server. Naming the file X-<version>.js accomplishes this, as if the
server references X-v20.js the client knows that v20 is the current
preferred version and that if it has cached v20 it can use it without an
extra RTT.



> > At Facebook, we use this method to serve our static resources. However
> > we've noticed that despite our nearly infinite expiration dates we see
> > 10-20% of requests (depending on browser) for static resource being
> > conditional revalidation. We believe this happens because UAs perform
> > revalidation of requests if a user refreshes the page. Our statistics
> show
> > that about 2% of navigations to FB are reloads -- however these requests
> > cause a disproportionate amount of traffic to our static resources
> because
> > they are never served by the user's cache.
>
> That tells me that 10-20% of your traffic is probably coming from a
> HTTP/1.1 proxy cache. Whether it reveals itself as a proxy or not.


Facebook is served exclusively over HTTPS meaning that we see relatively
few proxies. We are fairly sure that we're not seeing this level of proxy
traffic and that the revalidations are from UA triggered refreshes.

>
> I do think I know where you are coming from with this, and kind of
> agree. The UA whose refresh button goes straight to reload-everything
> instead of efficient revalidate-only-as-needed behaviour is broken IMHO.
> However that is a UA bug, not a protocol problem.


This is the crux of the problem -- as a website that carefully manages its
cache control headers, we want UAs to treat a reload as a normal navigation
and to respect normal caching rules. However the de facto behavior of UAs
is that a refresh causes revalidation of all resources on the page.

There are two possible solutions:

1) Ask UAs to change their behavior. This will silently change the behavior
of websites that do nothing. Maybe there's a website out there that says
"please hit the refresh button on your website to see the latest weather".
That site will no longer work if it depends on the revalidation behavior.
In my discussions with UA implementers, they seem unwilling (or at least
extremely hesitant) to take such a risk

2) Create a new behavior that websites can opt in to. Ensure that UAs
implement it consistently. This has less risk of breaking existing sites,
though I understand the hesitance to have a header that says "no *REALLY*
trust my expiration times". Perhaps the header is poorly advertising the
functionality that we wish to achieve. A better name/behavior might be
Cache-control: content-addressed. content-addressed would signal that the
contents of the current URL is a pure function of the URL itself. IE, that
the contents will never change. It would take priority over a max-age
header and signal to the browser that the resource should be permanently
cached.
Received on Friday, 10 July 2015 17:14:56 UTC