#401 /#402: Our old friends, weak ETags from Mark Nottingham on 2012-11-15 (ietf-http-wg@w3.org from October to December 2012)

From: Mark Nottingham <mnot@mnot.net>
Date: Thu, 15 Nov 2012 16:31:29 +1100
To: HTTP Working Group <ietf-http-wg@w3.org>
Message-Id: <ACD069C6-1CD9-4CD1-9644-BA82C0B9DE84@mnot.net>
We need to wrap this up. I've created <http://trac.tools.ietf.org/wg/httpbis/trac/ticket/401> and <http://trac.tools.ietf.org/wg/httpbis/trac/ticket/402> to track.

So far, I've seen a few complaints about Weak ETags in general, as well as the resolution of HTTP dates and unhappiness with content negotiation for compression, but no opinions about the questions at hand.

My proposals below --


On 20/07/2012, at 1:25 PM, Mark Nottingham wrote:

> I just received an interesting bug report on REDbot; <https://github.com/mnot/redbot/issues/109>
> 
> """
> When an ETag is marked weak with "W/", it need not change across different content encodings. "Weak" means the entity is semantically equivalent but not bit-equivalent. But Redbot complains that it doesn't change with different content encodings.
> """
> 
> Looking into this, a couple of things pop up:
> 
> 1) The obvious question, whether two different *negotiated* representations of the same resource can (or should) have the same weak ETag. Our current definition is this:
> 
> """
> In contrast, a "weak validator" is a representation metadata value that might not be changed for every change to the representation data. This weakness might be due to limitations in how the value is calculated, such as clock resolution or an inability to ensure uniqueness for all possible representations of the resource, or due to a desire by the resource owner to group representations by some self-determined set of equivalency rather than unique sequences of data. An origin server should change a weak entity-tag whenever it considers prior representations to be unacceptable as a substitute for the current representation. In other words, a weak entity-tag ought to change whenever the origin server wants caches to invalidate old responses.
> """ <https://svn.tools.ietf.org/svn/wg/httpbis/draft-ietf-httpbis/latest/p4-conditional.html#weak.and.strong.validators>
> 
> Strictly speaking, I don't think this is a problem for caches; following the rules for reusing a stored response <https://svn.tools.ietf.org/svn/wg/httpbis/draft-ietf-httpbis/latest/p6-cache.html#constructing.responses.from.caches>, a cache doesn't use the ETag to select a representation (what we used to call a variant) from a pool of many. 
> 
> Having said that, I'm still a bit uneasy. An ETag is supposed to be scoped to an entire resource, not just the selected representation. Because the response is negotiated, I'm tempted to argue that a compressed response is NOT semantically equivalent to an uncompressed one, just as a French response isn't equivalent to an English one, because the client has stated they don't understand English.
> 
> Thoughts? Is it worth clarifying this, or is it acceptable to have two different negotiated representations of the same resource share a weak ETag?

Thinking about this a bit more, I think my uneasiness stems from a situation where the server has sent these two responses (with the same target URI):

Vary: Accept-Encoding
Content-Encoding: gzip
ETag: W/"abc"

Vary: Accept-Encoding
ETag: W/"abc"

This is effectively contradicting itself; Vary indicates that the response differs based upon the Accept-Encoding request header (and Content-Encoding confirms that), but the ETag says that they're the same (albeit, only "semantically" the same).

OTOH, I don't see any hard requirements for ETag uniqueness within the scope of a resource -- either in 2616 or bis -- so technically, the example above could be with strong ETags and still be confomant. I think it'd still be functional within the caching spec as well (although some implementations may not behave well).

Both, however, conflict with our current description of ETag's semantics (or at least its spirit):

> An entity-tag is an opaque validator for differentiating between multiple representations of the same resource, regardless of whether those multiple representations are due to resource state changes over time, content negotiation resulting in multiple representations being valid at the same time, or both. 

I see a couple of possible ways forward:

a) Require that ETags differ when the server performs selection / conneg, regardless of their strength. 

b) Explain in prose that content negotiation usually affects the meaning of the response, therefore necessitating a different ETag.

c) Modify the definition of ETag to avoid this conflict

(Those are in my personal preference order; however, I'm more interested in getting the issue solved than any particular outcome)

Separate from that, I wonder if we should place specific requirements on ETag uniqueness across the representations of a resource. It'd likely be that strong ETags MUST differ when the body is byte-for-byte different, or there is a meaningful change in the representation's headers; for weak ETags, it'd be a meaningful change in the headers or body (when "meaningful" is defined by the server). 

Thoughts?


> 2) In the definitions of If-Match and If-None-Match, we don't specify whether the weak or strong comparison function is to be used when these validations actually occur, although we spend a lot of text on when to use weak vs. strong ETags themselves. 
> 
> Now, you might say that an origin server can decide whether to use the weak or strong function, but an intermediary or client cache doesn't have license to do weak comparison, and could cause a lot of trouble if it did. AFAICT we don't specify this, but I think we should.


I propose we specify that proxy and client caches MUST use the strong comparison function with If-Match and If-None-Match. 


--
Mark Nottingham   http://www.mnot.net/
Received on Thursday, 15 November 2012 05:31:56 UTC