Re: Summary of ETag related issues in RFC2518bis from Julian Reschke on 2005-12-19 (w3c-dist-auth@w3.org from October to December 2005)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Mon, 19 Dec 2005 22:50:31 +0100
To: Jim Whitehead <ejw@soe.ucsc.edu>
CC: w3c-dist-auth@w3.org
Message-ID: <43A72B27.4060900@gmx.de>
Jim Whitehead wrote:
>  From a client perspective, what they need is a reliable Etag be 
> returned by a successful PUT. I believe it's fine if that Etag is either 
> weak or strong -- what is damaging for clients is when the Etag changes 
> from weak to strong at some point after a successful PUT, since there is 
> no reliable mechanism for the client to discover this has happened (yes, 
> you can use polling, but there is no guarantee for how long you need to 
> perform polling before you know you've received the final Etag).

Note that weak ETags indeed can't be used with PUT (I was wrong on that 
in previous messages).

The weak-to-strong propagation in Apache IMHO is technically sound, as 
the semantics is compliant to HTTP, and (non-authoring) clients get the 
best out of it. The question is how to change this so that authoring 
clients will be happy as well.

Special-casing things IMHO will not work. In Apache, ETags are generated 
by httpd, not moddav, and thus are optimized for GET, not authoring.

> The main client use for the Etag returned by PUT is ensuring that their 
> locally cached state is the same as that held by the server. This 
> requirement goes across multiple client types, and is not limited just 
> to filesystem-like clients.
> 
> As a result, my views on the requirements below are as follows:
> 
> R1) Require servers to support strong entity tags,
> 
> Not needed -- weak Etags are as good as strong etags for most uses.
> 
> R2) Require servers to return entity tags for PUT requests
> 
> Required. Or, put another way, what clients need is the ability to know 
> the final value of the etag assigned to the server's representation of 
> the resource created by a successful PUT. It seems that the best way to 
> do this is to have the server respond with that Etag in the PUT 
> response. What might also work is for there to be a guarantee that this 
> final etag will be available within a given time period, and hence 
> clients will only need to perform a single follow-on request to get this 
> etag. However, protocol requirements involving timing are usually very 
> hard to get right -- it's not my first choice. That's why I think 
> returning the etag in the PUT response is the best way to communicate 
> this final etag value.

Again, if we require this, we'll have to make sure everybody agrees on 
what this means. That, at a minimum, requires getting consensus on the 
HTTP mailing list, and getting that consensus into the RFC2616 errata.

>> R3) Require ETags not to change when only properties or lock state change
>>
>> This is just broken spec writing, sorry. The guarantee that HTTP gives 
>> us that the ETag will change if the content changes, that's it. Of 
>> course it's sub-optimal if the etag changes when the content didn't, 
>> but a server may have very good reasons to do so. So yes, if the Etag 
>> changes the client may have to resync, just to find out that nothing 
>> changed, after all.
> 
> It is quite clear from reading RFC 2616 and RFC 2518 that the use of 
> Etags was intended to describe the entity due to a GET response. The 
> definitions of entity and etags are below (RFC 2616, section 1.3, 3.11, 
> 7.1).
> 
> entity
>       The information transferred as the payload of a request or
>       response. An entity consists of metainformation in the form of
>       entity-header fields and content in the form of an entity-body, as
>       described in section 7.
> 
> 
>    A "strong entity tag" MAY be shared by two entities of a resource
>    only if they are equivalent by octet equality.
> 
>    A "weak entity tag," indicated by the "W/" prefix, MAY be shared by
>    two entities of a resource only if the entities are equivalent and
>    could be substituted for each other with no significant change in
>    semantics. A weak entity tag can only be used for weak comparison.
> 
> 
>    Entity-header fields define metainformation about the entity-body or,
>    if no body is present, about the resource identified by the request.
>    Some of this metainformation is OPTIONAL; some might be REQUIRED by
>    portions of this specification.
> 
>        entity-header  = Allow                    ; Section 14.7
>                       | Content-Encoding         ; Section 14.11
>                       | Content-Language         ; Section 14.12
>                       | Content-Length           ; Section 14.13
>                       | Content-Location         ; Section 14.14
>                       | Content-MD5              ; Section 14.15
>                       | Content-Range            ; Section 14.16
>                       | Content-Type             ; Section 14.17
>                       | Expires                  ; Section 14.21
>                       | Last-Modified            ; Section 14.29
>                       | extension-header
> 
>        extension-header = message-header
> 
> 
>  From these definitions, it's clear that HTTP intended for only the 
> metadata in the entity-header fields to be considered part of the 
> entity. In particular, there are other kinds of metadata that are *not* 
> included in the definition of an entity.
> 
> It seems to me we have a few choices.
> 
> * Stick closely to the HTTP notion of entity, in which case changes to 
> the DAV:getcontentlanguage, DAV:getcontenttype, DAV:getcontentlength, 
> and DAV:getlastmodified MUST affect the Etag, and changes to other 
> properties MUST NOT affect the etag.

Jim, where does the "MUST NOT" come from??? A server that changes the 
ETag although the entity didn't change is completely compliant to the 
spec. Only the other direction (content change -> etag change) is a 
MUST, because it's relevant for interoperability (cache correctness).

A server that changes ETags although that wasn't necessary causes cache 
misses, that's it. It only becomes an issue for authoring clients.

> * Make a clear distinction between WebDAV properties and HTTP entities, 
> stating that changes to WebDAV properties MUST NOT cause changes to HTTP 
> entities.

I still don't see why we're using RFC2119 syntax here...

Anyway, if the issue is with clients that do a sequence of PROPPATCH/PUT 
(or the other way around), and we want them to be able to use the ETag 
in an "If" header, why don't we simply tell servers to return the ETag 
upon PROPPATCH if it changed?

In that case the client will always have the latest ETag, and it doesn't 
matter at all whether it changes with PROPPATCH or not.

> * Consider WebDAV properties to be part of the state of the resource, 
> even though they (except for the properties listed above) do not affect 
> the entity (as defined by HTTP), and hence any change to a property MUST 
> cause a change to the Etag.

I don't think anybody likes that idea.

> I'll note that the third option seems to be the hardest one to defend 
> based purely on the language of the HTTP specification.
> 
> Another solution is to introduce a new state token representing the dead 
> property state. This token could be retrieved from a property, or 
> available in a response header (perhaps for PROPFIND). This would allow 
> WebDAV to discuss the impact of property changes on a state token 
> without having to alter the definition of entity in HTTP.

That's also an interesting idea. I don't think it's relevant to this 
discussion, as servers that currently change ETags upon PROPPATCH do not 
do this because they feel it's a good idea, but because changing 
properties indeed causes a change to their backend that *results* in the 
ETag changing. So introducing a new state token wouldn't change that 
situation at all.

>  R4) Require servers to store arbitrary binary content,
> 
> This seems too strong to me. I think the requirement should be for 
> clients to discover when a server does not accept arbitrary binary 
> content, either by an appropriate error code for PUT, or via some other 

<http://greenbytes.de/tech/webdav/rfc2616.html#status.415>

> discovery mechanism (a property that exists that describes acceptable 
> MIME types and/or XML schemas?)

Something like that. Certainly not in scope for RFC2518bis, right?

> R5) Require servers to store dead properties,
> 
> Same as for R4.

Yep.

>> R6) Require servers to use persist Content-Type upon PUT.
> 
> I agree this is a SHOULD level requirement.

So do I. Any help in arguing this with RoyF appreciated :-).

>> I strongly object to any attempt to transform WebDAV into something 
>> like NFS over HTTP. WebDAV is an *extension* to HTTP; and HTTP has 
>> been designed for a wide variety of resource types, not only serving 
>> files. The good thing is that I'm almost sure that the IESG wouldn't 
>> let us do that anyway.
>>
> 
> I believe that these issues are more broad than just filesystem-like vs 
> non-filesystem-like, and hence framing the issue as "what those 
> filesystem-like clients need" is not productive.

Point taken; but I still think we really need to make sure other use 
cases will not break. That would take away a lot of WebDAV's (and 
HTTP's) flexibility.  I really think that what we're talking about is a 
specific profile. Should we be able to converge on what the feature set 
is, all that's left to do is to give it a name and make it easily 
discoverable upfront.

Best regards, Julian
Received on Monday, 19 December 2005 21:52:55 UTC