Re: ETags and concurrency control

I have to admit to always having a philosophical problem with the entire 
concept of weak etags.

semantic equivalence of say HTML content can surely only be reliably 
determined by a human?

Which means that to save client caches, a human has to decide whether a 
change to an entity is semantically significant or not when the change 
is made, and there needs to be a system to propagate this information 
via the server to the client - all to save some poor client who honours 
weak tags from the task of getting the up-to-date version if it didn't 
really change all that much (who decides?).

Doesn't this basically just make weak etags administratively burdensome 
and basically pointless, whose only possible affect is to prevent people 
getting fresh copies of data?  Why change something if it isn't 
significant?  Who really does this, and who really does use weak etags?

Do browsers warn when they choose not to get an up-to-date copy of 
something because of a weak etag?  Does the user ever find out that they 
didn't get the latest copy because someone decided that even though it 
changed, it wasn't significant?  Significance is a subjective concept, 
so it's impossible to reliably unilaterally decide whether a change 
truly is or is not significant. The only potential benefit is to the 
server operator in potentially reducing load, trading off against the 
information they provide being fresh or not.  I think most businesses 
would rather have their clients have the information they wish to 
provide them.


Brian Smith wrote:
> Henrik Nordstrom wrote:
>   
>> On mån, 2008-04-28 at 13:44 +0200, Robert Siemer wrote:
>>     
>>> That raises three issues:
>>>
>>> 1) It's not in RFC2616 (weak comparison for non-GET) and so 
>>>    it's not on RFC2616bis charta.
>>>       
>> Not convinced. The current limitations on weak etags is just 
>> silly with the exception of If-Range..
>>
>> In my view it's a specification error that validators based on
>> Last-Modified is allowed in more places than weak etag based ones.
>>     
>
> I agree with Henrik. The weak validator stuff in RFC 2616 doesn't make any sense. The point of the working group is to fix things so that the specification makes sense.
>
>   
>>> 2) Weak ETags don't mean "semantically equivalent" anymore. 
>>>    They mean nothing now (see i101). As of today there is no
>>>    replacement text proposed for i101, but weak ETags could
>>>    get degraded to something way weaker than Last-Modified.
>>>       
>> As specifiedin RFC2616 weak ETags is already less useful as cache
>> validator than Last-Modified, even if even the most trivial
>> implementations do generate a much stronger weak ETag cache validator
>> than Last-Modified.
>>     
>
> The proposed resolution for i101 just changes one kind of hand waving ("semantic equivalent") for another ("good enough, from the server's point of view"). To me, the two phrases mean exactly the same thing.
>
> AFAICT, The issue with i101 is that some servers do not general *useful* weak ETags; at least as Apache seems to just generate weak ETags that it will never match (which is inefficient but not totally broken). I'm not sure that their behavior is a reason to change the spec.--no harm, no foul. Besides, the existence of noncompliant implementations should not force the definition of compliance to change when that definition has useful properties.
>
> To me, "semantically equivalent" is something that is definitely vague. However, the use of strong ETags for range requests makes things clearer: If you can guarantee that your server will always generate a byte-for-byte identical representation (usable for range requests) for a given ETag, use a strong one; otherwise, if you think that generating an ETag makes sense at all, use a weak one. For example, mod_deflate should never return a strong ETag because the entity it generates is dependent on the system configuration. With a strong ETag, mod_deflate needs to ensure that the ETag changes whenever the mod_deflate configuration changes and whenever the system's zlib changes; with a weak ETag, it could continue to ignore these little details (like it does now when generating strong ETags.)
>
> In fact, I would say that weak ETags should be the default choice, and strong ETags should only be used when the application has specifically ensured that there is a one-to-one correspondence between the ETag and the byte stream that comprises the entity--in other words, only use strong ETags when you could support range requests (whether you support range requests or not). The restriction against using weak ETags in PUT and DELETE requests forces applications to use strong ETags in situations where they are not guaranteeing this one-to-one correspondence.
>
> The alternative would be to deprecate weak ETags and discourage their use, weaken the definition of strong ETag to match what weak ETags were original for, and then say that strong ETags are only guaranteed to have a one-to-one correspondence with entities if the server supports range requests for that resource. That seems to be the effective result of the proposed i101 resolution.
>
>   
>>> 3) the upgrade/downgrade questions will come up again:
>>>    match_weak("xyz", W/"xyz")
>>>
>>>    is 'true' or 'false'?
>>>       
>> True. We have already beaten that discussion to death I think.
>> Specifications is very clear on this.
>>
>> The error many seems to get trapped in is thinking that 
>> because of this clients may upgrade the received weak ETag
>> as a strong one in order to be able to acutally use the etag
>> for anything useful, but this MUST NOT be done and breaks
>> the whole scheme in obvious ways.
>>     
>
> Agreed 100%.
>
> Regards,
> Brian
>
>
>   

-- 
Adrien de Croy - WinGate Proxy Server - http://www.wingate.com

Received on Thursday, 1 May 2008 05:47:27 UTC