Re: dont-revalidate Cache-Control header from Ilya Grigorik on 2015-07-13 (ietf-http-wg@w3.org from July to September 2015)

From: Ilya Grigorik <igrigorik@gmail.com>
Date: Mon, 13 Jul 2015 15:31:04 -0700
To: Ben Maurer <ben.maurer@gmail.com>
Cc: Adam Rice <ricea@chromium.org>, Mark Nottingham <mnot@mnot.net>, Amos Jeffries <squid3@treenet.co.nz>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CAKRe7JH0_6b=CSE5bGBWa8pNoQYBNcoyjt8iKYP-G8JJCmgY9Q@mail.gmail.com>

On Fri, Jul 10, 2015 at 3:29 AM, Amos Jeffries <squid3@treenet.co.nz> wrote:

> > At Facebook, we use this method to serve our static resources. However
> > we've noticed that despite our nearly infinite expiration dates we see
> > 10-20% of requests (depending on browser) for static resource being
> > conditional revalidation. We believe this happens because UAs perform
> > revalidation of requests if a user refreshes the page. Our statistics
> show
> > that about 2% of navigations to FB are reloads -- however these requests
> > cause a disproportionate amount of traffic to our static resources
> because
> > they are never served by the user's cache.
>
> That tells me that 10-20% of your traffic is probably coming from a
> HTTP/1.1 proxy cache. Whether it reveals itself as a proxy or not.
>
> Speaking for Squid, we limit caching time at 1 year**. After which
> objects get revalidated before use. Expires header in HTTP/1.1 only
> means that objects are stale and must be revalidated before next use.
> Proxy with existing content does that with a synthesized revalidation
> request even if the client that triggered it did a plain GET. Thereafter
> the proxy has a new Expires value to use*** until that itself expires.


Amos, not sure I follow the proxy conclusion.. I'm reading this correctly,
it sounds like if I specify a 1 year+ max-age, then Squid will revalidate
the object for each request? If so, ouch. However, unless that gotcha
accounts for all of the extra revalidations, why would the proxy cause more
revalidations? Intuitively, shouldn't it reduce the number of revalidations
by collapsing number of requests to FB origin?

(also, as Ben noted, due to HTTPS, I doubt that's the culprit...)


On Sat, Jul 11, 2015 at 10:58 AM, Ben Maurer <ben.maurer@gmail.com> wrote:

> One major issue with this solution is that it doesn't address situations
> where content is embedded in a third party site. Eg, if a user includes an
> API like Google Maps or the Facebook like button those APIs may load
> subresources that should fall under this stricter policy. This issue cuts
> both ways -- if 3rd party content on your site isn't prepared for these
> semantics you could break it.


Hmm, I think a markup solution would still work for the embed case:
- you provide a stable embed URL with relatively short TTL (for quick
updates)
- embedded resource is typically HTML (iframe) or script, that initiates
subresources fetches
-- said resource can add appropriate attributes/markup on its subresources
to trigger the mode we're discussing here

^^ I think that would work, no? Also, slight tangent.. Fetch API has notion
of "only-if-cached" and "force-cache", albeit both of those are skipped on
"reload", see step 11:
https://fetch.spec.whatwg.org/#http-network-or-cache-fetch.

On Mon, Jul 13, 2015 at 2:57 AM, Ben Maurer <ben.maurer@gmail.com> wrote:

> We could also study this in the HTTP Archive -- if I took all resources
> that had a 30 day or greater max age and send their servers revalidation
> requests 1 week from today, what % of them return a 304 vs other responses.


Not perfect, but I think it's should offer a pretty good estimate:
http://bigqueri.es/t/how-many-resources-persist-across-a-months-period/607

- ~48% of resource requests end up requesting the same URL (after 30 days).
Of those...
-- ~84% fetch the same content (~40% of all request and ~33% of total bytes)
-- ~16% fetch different content (~8% of all requests and ~9% of total bytes)

ig

Received on Monday, 13 July 2015 22:32:12 UTC