Re: Etag-on-write, 2nd attempt (== IETF draft 01) from Julian Reschke on 2006-09-14 (ietf-http-wg@w3.org from July to September 2006)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Thu, 14 Sep 2006 16:21:05 +0200
To: Jamie Lokier <jamie@shareable.org>
CC: Yves Lafon <ylafon@w3.org>, Helge Hess <helge.hess@opengroupware.org>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <45096551.7020007@gmx.de>
Jamie Lokier schrieb:
> No.  Read the paragraph again:
> 
>> "The response-header fields allow the server to pass additional 
>> information about the response which cannot be placed in the 
>> Status-Line. These header fields give information about the server and 
>> about further access to the resource identified by the Request-URI." 
>> (<http://greenbytes.de/tech/webdav/rfc2616.html#rfc.section.6.2>).
> 
> It says "about further access to the resource identified by the
> Request-URI".  It does not say the Etag in a 200 response corresponds
> to the entity which will be returned by subsequent GETs.  It says
> "information [...] about further access", on which I think we all agree :)

Well, we had that discussion last year, and back then, consensus seemed 
to be that it's for the entity that will be returned upon GET (see 
thread around 
<http://lists.w3.org/Archives/Public/ietf-http-wg/2005OctDec/0017.html>). 
Note also:

"The ETag response-header field provides the current value of the entity 
tag for the requested variant." 
(<http://greenbytes.de/tech/webdav/rfc2616.html#rfc.section.14.19>)

...so the real question here, WTF does "requested variant" mean in this 
context (this is why I have a reminder in the spec to get that clarified).

>>> modified entity, provided the modification is semantically
>>> insignificant - and that _is_ up to the server.  It's also ok to send
>>> a strong Etag corresponding to the entity prior to modification.
>>> Useless (except in pathological cases), but ok.
>> We discussed this last year over here, and the consensus was that if an 
>> ETag is returned in PUT (or another write method such as PROPPATCH), it 
>> applies to what the server has, not what the client sent.
> 
> What do you mean by "has"? :)

...what it will return on GET...

> I.e. is that what the server "has" before it modifies the entity, or after?

That's not really meaningful, because the server may be rewriting 
content upon write and read... Let's stick with the language "upon GET".

> Conceptually it's fine to say the 200's Etag matches the response
> before the modification, and the modification is _as if_ another
> client changed the entity.

I thought so as well until last year, but was convinced otherwise (see 
above link).

> More importantly, that's what fits the caching model, and fits the
> octet-equivalence that is generally implied by a strong Etag (as
> described for byte-range requests).

You keep mentioning caching, although RFC2616 is very clear about 
caching of PUT responses. Could you please clarify?

> Also, you can imagine a server which doesn't actually do the
> modification at the time of the PUT.  It might do it at the time of
> the next GET, instead.  The difference ought to be transparent; it's a
> server implementation detail, and potentially a performance
> enhancement to delay the modification and thus calculation of Etag.

Correct :-). The time of rewrite is an implementation detail.

> After all this, though, regardless of the wording in existing specs,
> practical correctness must have precedence.  There's these reasons why
> PUT's 200 should not send a strong Etag corresponding to a
> server-modified value:
> 
>     1. It'll break existing caches which use byte-range GET requests
>        to the origin server, to refresh part of an entity when they
>        kept another part.  Strong Etags imply octet equivalence for
>        validating byte-range GET operations.

No, it won't, unless these caches are non-compliant in the first place.

>     2. Alternatively, the server can simply refuse byte-range requests
>        in these cases, but that just means wasting bandwidth.  And
>        this is not likely to be implemented uniformly.

A server could do that, but I don't think it's needed.

>     3. It'll also break existing proxy caches which _respond_ to
>        byte-range GET requests (this is different to 1).  This
>        includes "reverse-proxies" which are used to accelerate server
>        clusters.

I think we won't make any progress with this issue until you can give a 
concrete example of a RFC2616-compliant intermediate that would be 
broken in this case.

>     4. It'll break clients who, for whatever reason, want to see the
>        exact entity that clients using other network paths are seeing.
>        (Other network paths meaning via different proxy caches).  It
>        shouldn't be necessary to use "Cache-control: no-cache" for
>        that; that introduces unnecessary overhead when the value could
>        be cached safely.
> 
>     4. To avoid such breakage, generic data clients and proxy caches
>        will end up being programmed to ignore the Etag in PUT
>        responses.  This would be an unfortunate bandwidth overhead.
>        (Or an unfortunate proliferation of "don't cache if the Server
>        header is equal to..." heuristics).

RFC2616 already says that the response for PUT isn't cacheable. And as 
there never has been a guarantee that there is octet-by-octet storage 
upon PUT, I still don't see the problem.

> And these lesser reasons:
> 
>     5. If you send a weak Etag with PUT responses, then all the
>        caching and lost-update avoidance you wanted to do in specific
>        XCAP / CalDAV clients etc. is still possible.  Those clients
>        can be taught that a weak Etag means things like whitespace
>        formatting (in XML) and $Id$ keywords (in revision control) may
>        have changed but it doesn't matter.  And that's exactly what
>        weak Etags are for!

Again: RFC2616 says that weak etags can't be used for authoring:

"A server MUST use the strong comparison function (see Section 13.3.3) 
to compare the entity tags in If-Match." 
(<http://greenbytes.de/tech/webdav/rfc2616.html#rfc.section.14.24>).

>     6. If you really want the behaviour where the PUTting client will
>        continue to use the unmodified entity, without teaching the
>        client about weak Etag, for a specific server application, then
>        you can do this instead of sending the "after" Etag in the PUT
>        response:
> 
>        Send the "before" Etag in response to PUT.  Subsequent GETs (to
>        all clients) will have the "after" Etag.  Program the server to
>        match _both_ the "before" and "after" Etags in
>        If-Match/If-None-Match.  That gives you the same client cache
>        behaviour as sending the "after" Etag in response to PUT, and
>        the same updating behaviour for version control etc., but
>        doesn't break the caching model for other applications (as
>        described in points 1-4.

That's an interesting idea on it's own; I just don't see where the 
caching model is broken at all. As far as I understand, my draft 
currently describes what RFC2616 says (as discussed last year over 
here). The fact that we seem to disagree about what RFC2616 says clearly 
indicates to me that it really needs to be clarified (one way or the other).

> Because of points 5 and 6, there is no advantage to sending the
> "after" Etag in PUT responses: you can get the same intended effects
> in other ways, with no overhead.
> 
> Because of points 1-4, there are several disadvantages.

Best regards, Julian
Received on Thursday, 14 September 2006 14:21:19 UTC