Re: dont-revalidate Cache-Control header from Amos Jeffries on 2015-07-16 (ietf-http-wg@w3.org from July to September 2015)

From: Amos Jeffries <squid3@treenet.co.nz>
Date: Fri, 17 Jul 2015 00:35:05 +1200
To: ietf-http-wg@w3.org
Message-ID: <55A7A4F9.1010500@treenet.co.nz>
On 16/07/2015 11:00 p.m., Ben Maurer wrote:
> Here's a sketch of what I think a spec might look like, modeled off the
> stale-while-revalidate spec.
> 

With initial feedback inline...


> - I wasn't sure how specific we should get about the handling of browser
> reload behavior.
> - The static resource must stay "semantically" the same in this spec. This
> allows for static resources that could change on a byte-by-byte basis --
> for example if they are recompressed, etc.
> - The main security risk I can think of here is defacement (there's also a
> similar operational risk of accidentally returning a static document on the
> root page of your domain). As I noted, I think a reasonable mitigation here
> is to only apply the static semantics to sub-resources. Personally as a
> site owner, I'd prefer that UAs always revalidate content that the user
> directly navigates to.
> 
> Abstract
> 
> This document defines a Cache-Control extension formalizing of HTTP
> resources with long expiration times where the URI of the resource is
> changed when the content of the resource is changed.
> 
> 1. Introduction
> 
> RFC7234 states "The goal of caching in HTTP/1.1 is to significantly improve
> performance by reusing a prior response message to satisfy a current
> request" Highly performant websites seek to maximize the efficiency of HTTP
> caches by changing the URI of their resources every time the resource
> changes then giving each such URI an extremely distant expiration date.
> 
> The static HTTP Cache-Control extension clarifies that a resource is
> guaranteed never to change and allows caches to optimize based on these
> semantics. For example, it allows user agents to avoid revalidating static
> resources when a user presses the reload button. It also signals to caches
> that the expiration date of the object may be set further in the future
> than the actual expected lifetime of the object.

I dont think that last statement is correct. "may be set" implies that
heuristic lifetimes are applicable. But this control is explicitly
setting maximum lifetime when Expires/max-age/s-maxage are absent. That
is not a heuristic estimation but the absolute "infinity" value for the
cache.

> 
> 2. The static Cache-Control extension
> 
> When present in an HTTP response, the static Cache-Control extension
> indicates that the semantic content of the response will never change in
> the future. A server MUST NOT either in the past or future serve different
> semantic content for the same URI. If a server accidentally serves
> different content on the URI, it MUST alter all resources that reference
> that URI to reference a different URI. A server MAY either in the past or
> future serve an error response for the URI. The static cache-control header
> MUST be used with either the "public" or "private" cache-control directive.

Why? content is always either public or private no matter what
Cache-Controls are used.

> It MUST NOT be used in combination with "no-cache", "no-store", or
> "must-revalidate".

Or proxy-revalidate, or stale-while-revalidate, ... and the as yet
undefined ones?

Also, what if it does happen? Effectively any combination of cache
controls can be sent.

IMHO its probably best to say that when this control is present in
responses any other controls causing revalidation MUST NOT be generated
by senders, and recipients must ignore such revalidation controls. With
the list of named controls just an example set.


> The server MUST send a max-age directive and SHOULD use
> a delta-age of at least 30 days.

Why the MUST? "static" by itself could mean caching for maximum lifetime
permitted. (ie ~68 years).

The SHOULD and delta-age seems arbitrary. I thought the intent of
"static" was to prevent heuristic cache expiry/revalidation limits being
applied anyways.

> 
> A cache MUST treat a response with the static Cache-Control extension as
> having the maximum allowable lifetime for that cache.

There you go. :-) the max-age bit conflicts here.

> The cache SHOULD NOT
> attempt to revalidate the response.

s/SHOULD NOT/MUST NOT/ and this one line encompasses almost all the
requirements about revalidation controls.

> Operations that would normally cause
> the cache to revalidate the resource SHOULD result in the reuse of the
> cached response. The cache MAY make an unconditional request for the
> resource in response for an end-to-end reload. A user agent SHOULD NOT
> generate an end-to-end reload in response to prominent user-facing
> functionality such as a reload button.
> 
> 2.1 Example
> 
> A HTTP response might have the header:
> 
> Cache-Control: public,static,max-age=31536000
> 
> While normally this resource would only be considered cacheable for 1 year,
> a cache may choose to store it for as long as it wished. If the page using
> this resource was refreshed, the user agent would not revalidate the
> response. If the server wishes to change the contents of the resource in a
> semantically meaningful way, it would place the resource on new URI.
> 
> 3.  Security Considerations
> 
> User agents already have the capability of caching resources for long
> periods of time. The static header alters the behavior of the 

... "web browser" ...

> reload button
> so that it no longer triggers revalidation

... "for this resource."

> The static Cache-Control
> extension is designed to be used for sub-resources where the user does not
> see the URI of the request. If an attacker were to compromise a directly
> used URI (say the root document of a domain) and serve a response with the
> static extension, it could deface the URI in a way that would not easily be
> reversed by the refreshing the page. User Agents MAY  ignore the static

 s/by the refreshing/ by reloading/

> extension when a URI is directly navigated to by a user rather than
> referenced by another page.

Lots of MUST criteria, then a giant loophole of MAY ignore it all is a
bit rough. All the non-browser agents including middleware/shared caches
either cannot identify a "directly navigated" URL (or consider
*everything* as directly navigated) anyways so the MAY is just setting
up a worse problem of conflicting cache behaviour between software.

Probably best to leave the client sent Cache-Control:max-age=0 (aka
force-reload) control operational as a non-conditional fetch. This is
already implied by the text at the end of section 2.

HTH
Amos
Received on Thursday, 16 July 2015 12:36:12 UTC