Re: #401 /#402: Our old friends, weak ETags from Mark Nottingham on 2012-11-27 (ietf-http-wg@w3.org from October to December 2012)

From: Mark Nottingham <mnot@mnot.net>
Date: Tue, 27 Nov 2012 14:30:07 +1100
To: "Roy T. Fielding" <fielding@gbiv.com>
Cc: Larry Masinter <masinter@adobe.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-Id: <79D29054-A52F-4B96-AE5C-23DD3D32E969@mnot.net>
I've recorded the resolution for incorporation in -22 as:

>   a. add a MUST requirement for strong ETag uniqueness, and
>   b. clarify in prose that if you issue the same weak ETag for *any* two representations of a resource, you consider them interchangeable (possibly in the examples), and
>   c. clarify the definition of ETags WRT "differentiate"




On 19/11/2012, at 5:11 PM, Mark Nottingham <mnot@mnot.net> wrote:

> 
> On 19/11/2012, at 5:01 PM, "Roy T. Fielding" <fielding@gbiv.com> wrote:
> 
>> I think you are misunderstanding the "for differentiating between".
>> It does not imply that all entity tags uniquely differentiate.
> 
> Perhaps, but it's easy to misunderstand. Can you suggest a clarification?
> 
> 
> 
>> 
>> ....Roy
>> 
>> On Nov 18, 2012, at 3:12 PM, Mark Nottingham wrote:
>> 
>>> OK, so it sounds like the proposal for #401 is to:
>>> 
>>> a) add a requirement for strong ETag uniqueness, and
>>> b) clarify in prose that if you issue the same weak ETag for *any* two representations of a resource, you consider them interchangeable (possibly in the examples), and
>>> c) change the definition of ETags from:
>>> 
>>>> An entity-tag is an opaque validator for differentiating between multiple
>>>> representations of the same resource, regardless of whether those multiple
>>>> representations are due to resource state changes over time, content
>>>> negotiation resulting in multiple representations being valid at the same time, or
>>>> both.
>>> 
>>> to:
>>> 
>>>> An entity-tag is an opaque validator with a weakness flag. 
>>> 
>>> Make sense?
>>> 
>>> 
>>> 
>>> On 19/11/2012, at 7:33 AM, Larry Masinter <masinter@adobe.com> wrote:
>>> 
>>>>>> When an ETag is marked weak with "W/", it need not change across different
>>>>> content encodings. "Weak" means the entity is semantically equivalent but not
>>>>> bit-equivalent. But Redbot complains that it doesn't change with different
>>>>> content encodings.
>>>>>> """
>>>>>> 
>>>>>> Looking into this, a couple of things pop up:
>>>>>> 
>>>>>> 1) The obvious question, whether two different *negotiated* representations
>>>>> of the same resource can (or should) have the same weak ETag.
>>>> 
>>>> The intent of a weak ETag was to let the server decide. So two different negotiated representations _can_ have the same weak ETag, but whether they SHOULD have the same weak ETag is up to the server to decide whether it is happy about it.
>>>> 
>>>> 
>>>> Our current
>>>>> definition is this:
>>>>>> 
>>>>>> """
>>>>>> In contrast, a "weak validator" is a representation metadata value that might
>>>>> not be changed for every change to the representation data. This weakness
>>>>> might be due to limitations in how the value is calculated, such as clock
>>>>> resolution or an inability to ensure uniqueness for all possible representations of
>>>>> the resource, or due to a desire by the resource owner to group
>>>>> representations by some self-determined set of equivalency rather than unique
>>>>> sequences of data. An origin server should change a weak entity-tag whenever
>>>>> it considers prior representations to be unacceptable as a substitute for the
>>>>> current representation. In other words, a weak entity-tag ought to change
>>>>> whenever the origin server wants caches to invalidate old responses.
>>>>>> """ <https://svn.tools.ietf.org/svn/wg/httpbis/draft-ietf-httpbis/latest/p4-
>>>>> conditional.html#weak.and.strong.validators>
>>>>>> 
>>>>>> Strictly speaking, I don't think this is a problem for caches; following the rules
>>>>> for reusing a stored response <https://svn.tools.ietf.org/svn/wg/httpbis/draft-
>>>>> ietf-httpbis/latest/p6-cache.html#constructing.responses.from.caches>, a
>>>>> cache doesn't use the ETag to select a representation (what we used to call a
>>>>> variant) from a pool of many.
>>>>>> 
>>>>>> Having said that, I'm still a bit uneasy. An ETag is supposed to be scoped to an
>>>>> entire resource, not just the selected representation. Because the response is
>>>>> negotiated, I'm tempted to argue that a compressed response is NOT
>>>>> semantically equivalent to an uncompressed one, just as a French response isn't
>>>>> equivalent to an English one, because the client has stated they don't
>>>>> understand English.
>>>> 
>>>> An image which is mainly just an image but has a few French words might be preferred for French and the same image with a few English words replaced might be preferred for English, if you're doing content negotiation, but the server can decide to give them the same weak ETag if the server wants to, or not. 
>>>> 
>>>> 
>>>> 
>>>>>> Thoughts? Is it worth clarifying this, or is it acceptable to have two different
>>>>> negotiated representations of the same resource share a weak ETag?
>>>>> 
>>>>> Thinking about this a bit more, I think my uneasiness stems from a situation
>>>>> where the server has sent these two responses (with the same target URI):
>>>>> 
>>>>> Vary: Accept-Encoding
>>>>> Content-Encoding: gzip
>>>>> ETag: W/"abc"
>>>>> 
>>>>> Vary: Accept-Encoding
>>>>> ETag: W/"abc"
>>>>> 
>>>>> This is effectively contradicting itself; Vary indicates that the response differs
>>>>> based upon the Accept-Encoding request header (and Content-Encoding
>>>>> confirms that), but the ETag says that they're the same (albeit, only
>>>>> "semantically" the same).
>>>> 
>>>> I don't see the problem. The server is saying "If you want to use the old one instead of the new one, go ahead, it's fine with me".
>>>> 
>>>> 
>>>>> OTOH, I don't see any hard requirements for ETag uniqueness within the scope
>>>>> of a resource -- either in 2616 or bis -- so technically, the example above could
>>>>> be with strong ETags and still be confomant. I think it'd still be functional within
>>>>> the caching spec as well (although some implementations may not behave well).
>>>> 
>>>> I disagree about strong ETags, here, because you should be able to do byte range computations if you have strong ETags.
>>>> 
>>>>> Both, however, conflict with our current description of ETag's semantics (or at
>>>>> least its spirit):
>>>>> 
>>>>>> An entity-tag is an opaque validator for differentiating between multiple
>>>>> representations of the same resource, regardless of whether those multiple
>>>>> representations are due to resource state changes over time, content
>>>>> negotiation resulting in multiple representations being valid at the same time, or
>>>>> both.
>>>>> 
>>>>> I see a couple of possible ways forward:
>>>>> 
>>>>> a) Require that ETags differ when the server performs selection / conneg,
>>>>> regardless of their strength.
>>>> 
>>>> I think this is inconsistent with the purpose of weak ETags. Yes, for strong ones.
>>>> 
>>>>> b) Explain in prose that content negotiation usually affects the meaning of the
>>>>> response, therefore necessitating a different ETag.
>>>> 
>>>> Ditto
>>>> 
>>>>> c) Modify the definition of ETag to avoid this conflict
>>>> 
>>>> Clarify the definition of ETag possibly.
>>>> 
>>>>> (Those are in my personal preference order; however, I'm more interested in
>>>>> getting the issue solved than any particular outcome)
>>>>> 
>>>>> Separate from that, I wonder if we should place specific requirements on ETag
>>>>> uniqueness across the representations of a resource. It'd likely be that strong
>>>>> ETags MUST differ when the body is byte-for-byte different,  or there is a
>>>>> meaningful change in the representation's headers;
>>>> 
>>>> Yes
>>>> 
>>>>> for weak ETags, it'd be a
>>>>> meaningful change in the headers or body (when "meaningful" is defined by the
>>>>> server).
>>>> 
>>>> I think the server can decide based on whatever criteria it wants -- time of day, name of client, cookies, session state, whatever. 
>>>> 
>>>>> Thoughts?
>>>>> 
>>>>> 
>>>>>> 2) In the definitions of If-Match and If-None-Match, we don't specify whether
>>>>> the weak or strong comparison function is to be used when these validations
>>>>> actually occur, although we spend a lot of text on when to use weak vs. strong
>>>>> ETags themselves.
>>>>>> 
>>>>>> Now, you might say that an origin server can decide whether to use the weak
>>>>> or strong function, but an intermediary or client cache doesn't have license to do
>>>>> weak comparison, and could cause a lot of trouble if it did. AFAICT we don't
>>>>> specify this, but I think we should.
>>>> 
>>>> The weak ETag response *is* the license.  
>>>>> 
>>>>> I propose we specify that proxy and client caches MUST use the strong
>>>>> comparison function with If-Match and If-None-Match.
>>>> 
>>>> Why gut the intent of weak ETags?
>>>> 
>>>>> --
>>>>> Mark Nottingham   http://www.mnot.net/
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>>> --
>>> Mark Nottingham   http://www.mnot.net/
>>> 
>>> 
>>> 
>>> 
>> 
> 
> --
> Mark Nottingham   http://www.mnot.net/
> 
> 
> 

--
Mark Nottingham   http://www.mnot.net/
Received on Tuesday, 27 November 2012 03:30:36 UTC