Re: Summary of ETag related issues in RFC2518bis from Jim Whitehead on 2005-12-19 (w3c-dist-auth@w3.org from October to December 2005)

From: Jim Whitehead <ejw@soe.ucsc.edu>
Date: Mon, 19 Dec 2005 13:20:55 -0800
To: Julian Reschke <julian.reschke@gmx.de>
Cc: w3c-dist-auth@w3.org
Message-Id: <942948C5-CCBE-459E-A038-8C966E334840@cs.ucsc.edu>
>
Julian,

Thank you for posting this issue summary.

 From a client perspective, what they need is a reliable Etag be  
returned by a successful PUT. I believe it's fine if that Etag is  
either weak or strong -- what is damaging for clients is when the  
Etag changes from weak to strong at some point after a successful  
PUT, since there is no reliable mechanism for the client to discover  
this has happened (yes, you can use polling, but there is no  
guarantee for how long you need to perform polling before you know  
you've received the final Etag).

The main client use for the Etag returned by PUT is ensuring that  
their locally cached state is the same as that held by the server.  
This requirement goes across multiple client types, and is not  
limited just to filesystem-like clients.

As a result, my views on the requirements below are as follows:

R1) Require servers to support strong entity tags,

Not needed -- weak Etags are as good as strong etags for most uses.

R2) Require servers to return entity tags for PUT requests

Required. Or, put another way, what clients need is the ability to  
know the final value of the etag assigned to the server's  
representation of the resource created by a successful PUT. It seems  
that the best way to do this is to have the server respond with that  
Etag in the PUT response. What might also work is for there to be a  
guarantee that this final etag will be available within a given time  
period, and hence clients will only need to perform a single follow- 
on request to get this etag. However, protocol requirements involving  
timing are usually very hard to get right -- it's not my first  
choice. That's why I think returning the etag in the PUT response is  
the best way to communicate this final etag value.

> R3) Require ETags not to change when only properties or lock state  
> change
>
> This is just broken spec writing, sorry. The guarantee that HTTP  
> gives us that the ETag will change if the content changes, that's  
> it. Of course it's sub-optimal if the etag changes when the content  
> didn't, but a server may have very good reasons to do so. So yes,  
> if the Etag changes the client may have to resync, just to find out  
> that nothing changed, after all.

It is quite clear from reading RFC 2616 and RFC 2518 that the use of  
Etags was intended to describe the entity due to a GET response. The  
definitions of entity and etags are below (RFC 2616, section 1.3,  
3.11, 7.1).

entity
       The information transferred as the payload of a request or
       response. An entity consists of metainformation in the form of
       entity-header fields and content in the form of an entity- 
body, as
       described in section 7.


    A "strong entity tag" MAY be shared by two entities of a resource
    only if they are equivalent by octet equality.

    A "weak entity tag," indicated by the "W/" prefix, MAY be shared by
    two entities of a resource only if the entities are equivalent and
    could be substituted for each other with no significant change in
    semantics. A weak entity tag can only be used for weak comparison.


    Entity-header fields define metainformation about the entity-body  
or,
    if no body is present, about the resource identified by the request.
    Some of this metainformation is OPTIONAL; some might be REQUIRED by
    portions of this specification.

        entity-header  = Allow                    ; Section 14.7
                       | Content-Encoding         ; Section 14.11
                       | Content-Language         ; Section 14.12
                       | Content-Length           ; Section 14.13
                       | Content-Location         ; Section 14.14
                       | Content-MD5              ; Section 14.15
                       | Content-Range            ; Section 14.16
                       | Content-Type             ; Section 14.17
                       | Expires                  ; Section 14.21
                       | Last-Modified            ; Section 14.29
                       | extension-header

        extension-header = message-header


 From these definitions, it's clear that HTTP intended for only the  
metadata in the entity-header fields to be considered part of the  
entity. In particular, there are other kinds of metadata that are  
*not* included in the definition of an entity.

It seems to me we have a few choices.

* Stick closely to the HTTP notion of entity, in which case changes  
to the DAV:getcontentlanguage, DAV:getcontenttype,  
DAV:getcontentlength, and DAV:getlastmodified MUST affect the Etag,  
and changes to other properties MUST NOT affect the etag.

* Make a clear distinction between WebDAV properties and HTTP  
entities, stating that changes to WebDAV properties MUST NOT cause  
changes to HTTP entities.

* Consider WebDAV properties to be part of the state of the resource,  
even though they (except for the properties listed above) do not  
affect the entity (as defined by HTTP), and hence any change to a  
property MUST cause a change to the Etag.

I'll note that the third option seems to be the hardest one to defend  
based purely on the language of the HTTP specification.

Another solution is to introduce a new state token representing the  
dead property state. This token could be retrieved from a property,  
or available in a response header (perhaps for PROPFIND). This would  
allow WebDAV to discuss the impact of property changes on a state  
token without having to alter the definition of entity in HTTP.


  R4) Require servers to store arbitrary binary content,

This seems too strong to me. I think the requirement should be for  
clients to discover when a server does not accept arbitrary binary  
content, either by an appropriate error code for PUT, or via some  
other discovery mechanism (a property that exists that describes  
acceptable MIME types and/or XML schemas?)

R5) Require servers to store dead properties,

Same as for R4.

> R6) Require servers to use persist Content-Type upon PUT.

I agree this is a SHOULD level requirement.

> I strongly object to any attempt to transform WebDAV into something  
> like NFS over HTTP. WebDAV is an *extension* to HTTP; and HTTP has  
> been designed for a wide variety of resource types, not only  
> serving files. The good thing is that I'm almost sure that the IESG  
> wouldn't let us do that anyway.
>

I believe that these issues are more broad than just filesystem-like  
vs non-filesystem-like, and hence framing the issue as "what those  
filesystem-like clients need" is not productive.

- Jim
Received on Monday, 19 December 2005 21:21:08 UTC