Re: ETags vs Variants, was: Revising RFC2616 - what's happening from Roy T. Fielding on 2006-11-06 (ietf-http-wg@w3.org from October to December 2006)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Sun, 5 Nov 2006 17:34:14 -0800
To: Henrik Nordstrom <hno@squid-cache.org>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-Id: <3F8E20CC-570E-489D-B184-E8348E93D323@gbiv.com>
It would make me feel better if proposals to change HTTP/1.1 were based
on hard facts and not random conjecture.

On Oct 22, 2006, at 5:22 AM, Henrik Nordstrom wrote:
> sön 2006-10-22 klockan 04:35 +0100 skrev Jamie Lokier:
>
>>> I would say that if the value for "Vary" changes between to HTTP
>>> requests, the sever implementation/configuration has somehow  
>>> changed,
>>> and a proxy should invalidate all cached entries for that URI.
>>
>> No, no.
>
> Yes yes yes ;-)

No, and you can read the specification to learn why.  Vary is a  
statement
by the origin server about how intermediate caches should behave in  
regards
to *this* response.  It is impossible for the cache to know anything  
about
how the resource works or how responses to similar requests in the  
future
might actually vary -- it is only responsible for obeying the origin  
server's
wishes for *this* response, and *this* response remains valid for as  
long
as it remains fresh.

It is not the cache's responsibility to ensure that the server correctly
implements Vary.  It is not the cache's responsibility to exhaustively
check every possible combination of client request headers.  The Vary
field states what the origin server cares about for *this* response.
What it cares about for this response may be entirely different from
what it cares about for the next response -- some responses are more
specific than others, and some responses are more generic than others.

>> The natural implementation is for the server to note each time a
>> request header is examined to compute the response, and to emit Vary
>> with those headers.
>
> True, but this creates quite a bit of a nightmare at the cache  
> level. So
> with this requirement Vary: will still become equal to "no-store" in
> most implementations, perhaps with hardcoded special cases for the  
> most
> common uses or more likely caches trying to outguess the servers and
> implementing their own content negotiation schemes. This simply  
> because
> general caching of Vary entities then becomes too complicated to even
> care trying to index the variants in the cache.

Nonsense.  It is a trivial linear algorithm of store and compare that
has been implemented correctly in every implementation of HTTP/1.1 that
has actually attempted to implement it.  Even the lousy Microsoft DLL
that turns off caching when Vary is present is a "correct  
implementation",
even though it is absurdly inefficient.  Others have done better.

>> If you specify that a cache must purge all variants when receiving a
>> Vary header which is different from previously received Vary, then
>> servers will realistically have to send "Vary: Accept-Encoding,
>> User-Agent" even in the case that the response _doesn't_ depend on
>> User-Agent.
>
> Which is fine to me. Especially if the server supports ETag and
> If-None-Match on larger responses.

Flushing the cache is a correctness-preserving action for an HTTP
intermediary, regardless of the contents of Vary.  In other words,
you will be compliant with HTTP even if your caching sucks.  If you
implement Vary as it is specified in RFC 2616, your implementation will
be both correct and cache when appropriate.  I don't see a problem here.

>> However, when the server's use of request headers is less tightly
>> coupled, it's _much_ harder to do that.
>
> True.
>
> So question then becomes multifold:
>
> 1) Is caching of Vary responses worth the effort to get it working
> proper?

Yes.

> 2) If caching of Vary is desireable, what component of the network
> should have to deal with the complexity involved?

There is no complexity involved.  An origin server makes its own choices
about what is important to Vary upon, and can set the header field
accordingly using any number of simple configuration mechanisms.
A cache simply follows those instructions.

> What we see today is that neither component really cares. Most servers
> forgets to send Vary headers when they should, instead using no- 
> cache to
> solve the problem. And most caches sees Vary too complex and reads it
> the same as no-store, or in some user-agent cases reads Vary  
> wrongly as
> "no-cache" (need to revalidate on every request) and additionally
> getting the validation completely wrong.

That is conjecture.  Most caches implement Vary correctly or haven't
been updated to HTTP/1.1 yet.  The Microsoft client DLL implements
Vary in the least efficient way, but no sane protocol designer will
base an RFC on one of Microsoft's implementations when the rest of
the world has no problem dealing with that feature.  HTTP/1.1 offers
"no-cache" as an option that may be used for any number of reasons,
so its presence or absence has nothing whatsoever to do with Vary.

There is no issue here.  Vary works in many implementations and there
has never been a single report of interoperability problems between
clients and servers that have implemented Vary as specified.  It is an
integral part of HTTP/1.x caching that cannot be deprecated.

Cheers,

Roy T. Fielding                            <http://roy.gbiv.com/>
Chief Scientist, Day Software              <http://www.day.com/>
Received on Monday, 6 November 2006 01:33:54 UTC