Our old friends, weak ETags

I just received an interesting bug report on REDbot; <https://github.com/mnot/redbot/issues/109>

"""
When an ETag is marked weak with "W/", it need not change across different content encodings. "Weak" means the entity is semantically equivalent but not bit-equivalent. But Redbot complains that it doesn't change with different content encodings.
"""

Looking into this, a couple of things pop up:

1) The obvious question, whether two different *negotiated* representations of the same resource can (or should) have the same weak ETag. Our current definition is this:

"""
In contrast, a "weak validator" is a representation metadata value that might not be changed for every change to the representation data. This weakness might be due to limitations in how the value is calculated, such as clock resolution or an inability to ensure uniqueness for all possible representations of the resource, or due to a desire by the resource owner to group representations by some self-determined set of equivalency rather than unique sequences of data. An origin server should change a weak entity-tag whenever it considers prior representations to be unacceptable as a substitute for the current representation. In other words, a weak entity-tag ought to change whenever the origin server wants caches to invalidate old responses.
""" <https://svn.tools.ietf.org/svn/wg/httpbis/draft-ietf-httpbis/latest/p4-conditional.html#weak.and.strong.validators>

Strictly speaking, I don't think this is a problem for caches; following the rules for reusing a stored response <https://svn.tools.ietf.org/svn/wg/httpbis/draft-ietf-httpbis/latest/p6-cache.html#constructing.responses.from.caches>, a cache doesn't use the ETag to select a representation (what we used to call a variant) from a pool of many. 

Having said that, I'm still a bit uneasy. An ETag is supposed to be scoped to an entire resource, not just the selected representation. Because the response is negotiated, I'm tempted to argue that a compressed response is NOT semantically equivalent to an uncompressed one, just as a French response isn't equivalent to an English one, because the client has stated they don't understand English.

Thoughts? Is it worth clarifying this, or is it acceptable to have two different negotiated representations of the same resource share a weak ETag?


2) In the definitions of If-Match and If-None-Match, we don't specify whether the weak or strong comparison function is to be used when these validations actually occur, although we spend a lot of text on when to use weak vs. strong ETags themselves. 

Now, you might say that an origin server can decide whether to use the weak or strong function, but an intermediary or client cache doesn't have license to do weak comparison, and could cause a lot of trouble if it did. AFAICT we don't specify this, but I think we should.

Again, thoughts?

Thanks,

--
Mark Nottingham   http://www.mnot.net/

Received on Friday, 20 July 2012 03:26:26 UTC