Re: Feedback about stale-while-revalidate

On 28/10/2015 9:41 p.m., Kenji Baheux wrote:
> Hi,
> 
> I'm Kenji Baheux, working on Chrome at Google.
> 
> The Chrome team has been looking into supporting stale-while-revalidate
> <https://tools.ietf.org/html/rfc5861> (s-w-r for short) in the user agent.
> While discussing the feature with various parties, we felt that there was a
> mismatch between the spec and their requirements.
> 
> I talked about this with Mark while at TPAC and he suggested that I start a
> thread in this working group.
> 
> *For background:* our interest in the feature is motivated by the desire to
> remove HTTP revalidations from the critical path. In particular, our
> attention has been on third parties (ads, analytics, web fonts, social...)
> because of scale and because of common characteristics:
> 
>    - third parties assets tend to block rendering (e.g. script and
>    stylesheet in the head)
>    - third parties assets tend to have very short max-age (e.g. 15-30
>    minutes is common)
>    - third parties tend to be hosted at a unique origin making the cost of
>    setting up a connection particularly prohibitive on mobile.
> 
> 
> *The mismatch*
> The spec doesn't impose any strong requirements on the need to perform
> async revalidation during the s-w-r window. In fact, it allows for an
> implementation to use a stale asset for up to max-age + s-w-r seconds.

"SHOULD attempt to revalidate it" is strong language. It requires a very
good reason not to do the revalidation.

There are real cases when a cache absolutely cannot revalidate, or even
do a full new fetch. But has content that would be marginally usable in
place of the latest fresh response. IIRC this is why MUST is not used
already.


> 
> For business reasons and the potential risk of breaking the publisher's
> website, most third parties are opting for a very short max-age to
> guarantee a short time to recovery. Actual example: max-age=900 (15
> minutes).
> 
> To avoid more frequent revalidations from user agents who don't support
> s-w-r, max-age must be kept roughly the same.
> 
> To maintain a similar guarantee of time to recovery, s-w-r would only be
> used with a tiny window (e.g. a couple of minutes). Unfortunately, this
> greatly limits its effectiveness, esp. on mobile considering common usage
> patterns: short sessions with gaps of several hours. Example: max-age=900,
> s-w-r=60.
> 
> 
> *Concession*

"concession" is a fighting term.

Who are fighting here? and which party is so graciously granting their
opposition a concession?


> If it was guaranteed that an async revalidation MUST be performed in order
> to use a "semi-stale" asset, effectively turning the time to recovery to
> max-age + 1 use, our interlocutors felt they could use a larger s-w-r
> window. Example: max-age=900, s-w-r: 86400 (1 day).
> 

I believe that requirement already does exist for caches complying with
RFC 7234.


RFC 7234 already requires in section 4.2.2:
"
A cache MUST NOT use heuristics to determine freshness when an
explicit expiration time is present in the stored response.
"

The existant max-age=N is just such an explicit expiration time.


Then in section 4.2.4:
"
A cache MUST NOT send stale responses unless it is disconnected
(i.e., it cannot contact the origin server or otherwise find a
forward path) or doing so is explicitly allowed (e.g., by the
max-stale request directive; see Section 5.2.1).
"

The ability to revalidate implies that the cache is not disconnected.

The stale-once definition does not provide an explicit allowance to
serve stale content, only a guidance of how to act when stale is served.

Thus the behaviour being proposed is already existing practice for RFC
7234 compliant caches. Unless they are explictly within one of the known
cases where a revalidation or new fetch are not possible.

stale-while-revalidate also has a delta-seconds parameter for how long
the stale content may be served before it reverts to the behaviour of
waiting for fresh content to be available. Which appears identical to
the proposed stale-once= delta-seconds meaning.


> 
> *Strawman proposal*
> Option 1: revising the semantics (see below) of s-w-r.
> Option 2: introduce a new extension (see below)
> 
> 
> Cache-control: max-age=900, stale-once=86400
> 
> When present in an HTTP response, the stale-once Cache-Control extension
> indicates that caches MAY serve the response in which it appears after it
> becomes stale, up to the indicated number of seconds but *only once*.

So what happens when a second display/fetch needs to happen during the
revalidation RTT ?

I see no reason to add latency to further deliveries is that one RTT
happens to be long. Doing so would likely make the situation worse as
service fluctuates between fast (but stale) and slow responses.
Potential client recovery or retries of the slow responses causing the
server to get more overloaded in a downward spiral.

And what happens of the cache is disconnected, or the client explicitly
mandated that it will only accept the cached content?

RFC 5861 has a SHOULD to take these cases and similar obstructions into
account.


> 
>      stale-once = "stale-once" "=" delta-seconds
> 
> If a cached response is served stale due to the presence of this extension,
> the cache *MUST attempt to revalidate* it while serving the stale response
> (i.e., without blocking).


If this behaviour change is actually needed at all I believe it should
take the form of an update to RFC 5861. Altering the SHOULD to MUST in
the section 3 requirement:

"
   If a cached response is served stale due to the presence of this
   extension, the cache SHOULD attempt to revalidate it while still
   serving stale responses (i.e., without blocking).
"


That would avoid the issues with RTT being longer than the time between
needs to use it. While also making the specification provide that
guarantee the "concession" requires.

BUT then you force the caches which are disconnected into a position of
presenting 5xx or 4xx status responses. *including* browser private
caches with the content.

So the cost of stale-once (guaranteed 5xx errors, more latency on many
requests, AND more server revalidations) appears to be worse than the
costs from stale-while-revalidate with short delta-seconds (only more
server revalidations) or long delta-seconds (only more clients receiving
stale content - relative to revalidation RTT).

Amos

Received on Wednesday, 28 October 2015 10:23:14 UTC