Re: Etag-on-write, 2nd attempt (== IETF draft 01)

Julian Reschke wrote:
> >13.3.3 describes strong and weak validators, and explains that any
> >change to the entity, including a semantically insignificant one,
> >should change a strong Etag but doesn't have to change a weak Etag.
> >
> >So the server is free to make modifications, but if it does then it
> >shouldn't send a strong Etag corresponding to the modified entity in
> >the PUT response.  It's ok to send a weak Etag corresponding to the
> 
> I'm not sure how you come to that conclusion. As pointed out in 
> <http://greenbytes.de/tech/webdav/draft-reschke-http-etag-on-write-01.html#rfc.section.1.3>, 
> "ETag" is a response header, and response headers per definition apply to 
> the entity on the server, not the one in the request:

No.  Read the paragraph again:

> "The response-header fields allow the server to pass additional 
> information about the response which cannot be placed in the 
> Status-Line. These header fields give information about the server and 
> about further access to the resource identified by the Request-URI." 
> (<http://greenbytes.de/tech/webdav/rfc2616.html#rfc.section.6.2>).

It says "about further access to the resource identified by the
Request-URI".  It does not say the Etag in a 200 response corresponds
to the entity which will be returned by subsequent GETs.  It says
"information [...] about further access", on which I think we all agree :)

> >modified entity, provided the modification is semantically
> >insignificant - and that _is_ up to the server.  It's also ok to send
> >a strong Etag corresponding to the entity prior to modification.
> >Useless (except in pathological cases), but ok.
> 
> We discussed this last year over here, and the consensus was that if an 
> ETag is returned in PUT (or another write method such as PROPPATCH), it 
> applies to what the server has, not what the client sent.

What do you mean by "has"? :)

I.e. is that what the server "has" before it modifies the entity, or after?

Conceptually it's fine to say the 200's Etag matches the response
before the modification, and the modification is _as if_ another
client changed the entity.

More importantly, that's what fits the caching model, and fits the
octet-equivalence that is generally implied by a strong Etag (as
described for byte-range requests).

Also, you can imagine a server which doesn't actually do the
modification at the time of the PUT.  It might do it at the time of
the next GET, instead.  The difference ought to be transparent; it's a
server implementation detail, and potentially a performance
enhancement to delay the modification and thus calculation of Etag.

After all this, though, regardless of the wording in existing specs,
practical correctness must have precedence.  There's these reasons why
PUT's 200 should not send a strong Etag corresponding to a
server-modified value:

    1. It'll break existing caches which use byte-range GET requests
       to the origin server, to refresh part of an entity when they
       kept another part.  Strong Etags imply octet equivalence for
       validating byte-range GET operations.

    2. Alternatively, the server can simply refuse byte-range requests
       in these cases, but that just means wasting bandwidth.  And
       this is not likely to be implemented uniformly.

    3. It'll also break existing proxy caches which _respond_ to
       byte-range GET requests (this is different to 1).  This
       includes "reverse-proxies" which are used to accelerate server
       clusters.

    4. It'll break clients who, for whatever reason, want to see the
       exact entity that clients using other network paths are seeing.
       (Other network paths meaning via different proxy caches).  It
       shouldn't be necessary to use "Cache-control: no-cache" for
       that; that introduces unnecessary overhead when the value could
       be cached safely.

    4. To avoid such breakage, generic data clients and proxy caches
       will end up being programmed to ignore the Etag in PUT
       responses.  This would be an unfortunate bandwidth overhead.
       (Or an unfortunate proliferation of "don't cache if the Server
       header is equal to..." heuristics).

And these lesser reasons:

    5. If you send a weak Etag with PUT responses, then all the
       caching and lost-update avoidance you wanted to do in specific
       XCAP / CalDAV clients etc. is still possible.  Those clients
       can be taught that a weak Etag means things like whitespace
       formatting (in XML) and $Id$ keywords (in revision control) may
       have changed but it doesn't matter.  And that's exactly what
       weak Etags are for!

    6. If you really want the behaviour where the PUTting client will
       continue to use the unmodified entity, without teaching the
       client about weak Etag, for a specific server application, then
       you can do this instead of sending the "after" Etag in the PUT
       response:

       Send the "before" Etag in response to PUT.  Subsequent GETs (to
       all clients) will have the "after" Etag.  Program the server to
       match _both_ the "before" and "after" Etags in
       If-Match/If-None-Match.  That gives you the same client cache
       behaviour as sending the "after" Etag in response to PUT, and
       the same updating behaviour for version control etc., but
       doesn't break the caching model for other applications (as
       described in points 1-4.

Because of points 5 and 6, there is no advantage to sending the
"after" Etag in PUT responses: you can get the same intended effects
in other ways, with no overhead.

Because of points 1-4, there are several disadvantages.

-- Jamie

Received on Thursday, 14 September 2006 13:57:47 UTC