Re: Etag-on-write, draft -04 from Julian Reschke on 2006-12-02 (ietf-http-wg@w3.org from October to December 2006)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Sat, 02 Dec 2006 22:00:36 +0100
To: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <4571E974.8090208@gmx.de>
Hi,

before I get to Roy's mail, I'd like to point out how we got here.

During autumn 2005, a requirement to return strong etags made it into 
RFC2518bis. After some discussion over in the WebDAV WG we realized that 
RFC2616 isn't very clear about what an ETag in a PUT response means.

With that question I went to *this* mailing list; see thread starting at 
<http://lists.w3.org/Archives/Public/ietf-http-wg/2005OctDec/0013.html>. 
After some discussion I went away with the impression that there was 
consensus that the entity tag is for the entity that was stored, not for 
what you sent. And of course everybody seems to agree that a server may 
rewrite content sent with PUT, such as for keyword substitution, object 
re-serialization (think XML or ICS), charset translation, whatever.

The IESG decided that this issue should be resolved, but in a separate 
spec, because it affects many other specs, most of which not based on 
WebDAV.

In the meantime, the first draft that was supposed to do this has long 
expired (<http://tools.ietf.org/html/draft-whitehead-http-etag-00>). 
Also, the IESG has approved (for the standards track!!!!) one 
specification requiring the server to return a strong ETag, although 
content always will be rewritten (XCAP), and another one, requiring 
*not* to return a strong ETag in that case (CALDAV). That essentially 
means that a single HTTP resource never can be both XCAP and CALDAV 
compliant. This may not be a problem in practice, but the thought that 
different Standards Track specifications profile RFC2616 in completely 
opposite directions makes me shudder.

So when I started work on my draft, I did that based on the assumption 
that there *is* consensus on what RFC2616 says, and attempted both a 
proposed clarification, and an extension helping in some of the use 
cases. The key part here is 
<http://greenbytes.de/tech/webdav/draft-reschke-http-etag-on-write-04.html#rfc.section.1.3, 
which tries to explain how we came to the conclusion about the 
aforementioned ETag behavior, and 
<http://greenbytes.de/tech/webdav/draft-reschke-http-etag-on-write-04.html#rfc.section.3>, 
proposing clarifications for RFC2616.

Roy seems to argue from the point of view what RFC2616 *should* be 
saying. That's valid as well, but I really don't see how that helps 
unless we come to a consensus that making an actual change to the spec 
is OK with respect to servers that may break because of that. If we do 
come to that consensus, I'll be happy to either let the draft expire, or 
  to rewrite it based on the new information.

OK, now for my comments...


Roy T. Fielding schrieb:
> 
> On Nov 30, 2006, at 9:41 PM, Wilfredo Sánchez Vega wrote:
>>   I also think that there are clients in the world that would like to 
>> know that if they did a an immediate subsequent GET of the same 
>> variant, that the data would look different.  Several client authors 
>> have expressed interest in this information, which is what the 
>> Entity-Transform is meant to address.
> 
> Which, again, is why it is completely unnecessary.  If the entity being
> edited had an etag, and the result from performing a PUT did not include
> an etag (or included only a weak etag), then the client can conclude that
> the data would look different.  Whether or not it does happen to be

...I'd say *may* look different. Many servers never return an ETag upon PUT.

> different when a subsequent GET is made is a distributed systems problem
> that won't disappear, regardless of Entity-Transform.  The client should
> be assuming a transform has occurred if it does not receive a strong etag,
> since it has no alternative available to it other than to perform a
> subsequent GET.  The Entity-Transform provides no useful information
> and may, in fact, be completely wrong if there is an intermediary in
> the middle doing transformations of its own -- the spec already
> anticipated that for etag, but not for future extension headers.

First of all, this argument seems to be based on the assumption that 
content-rewriting servers do not return ETags, which we know is not 
true, and is even required in XCAP. Are these servers non-compliant to 
RFC2616? I don't think so.

Furthermore, I don't get the part about the intermediary. Even if the 
server didn't rewrite the content, and returned a strong ETag, an 
intermediary may have rewritten the content, right?

>>   Example A.1 describes a Subversion server.  Let's say that the 
>> client isn't the Subversion client, but a text editor, and we have 
>> auto-versioning enabled on the server.
>>
>>   My editor saves a document with "$Id$" tags in it.  The server says 
>> OK and gives me an ETag.
>>
>>   You are arguing, I think, that my editor has no reason to care that 
>> the $Id$ tag portion was modified, since subsequent edits will result 
>> in the same entity on the server, regardless of whether it refreshes 
>> the modified $Id$ tag in the document, so a clarified ETag should be 
>> sufficient.
> 
> Or simply not returning the etag at all, which tells the client it needs
> to obtain the new representation that contains the current Id info.

But returning the ETag is very useful, and completely harmless unless 
the client tries a byte range request. The final status of the result 
will be the same, no matter whether the client does a 
PUT/GET(refresh)/PUT or a PUT/PUT request, so it's pointless to refresh 
the local copy unless somebody is interested in the newly substituted 
keywords.

>>   This is probably true for a lot of editing situations.  In the case 
>> of normalization of, say, iCalendar data, I think you are right.
>>
>>   However, in the Subversion example, I think the edit does care, 
>> because the point of the $Id$ tag is that the user can see what 
>> version of the document they are looking at when they view the file.  
>> In this case, knowing that the data was modified will let the editor 
>> know that a subsequent GET request would be a good idea.  A Subversion 
>> client wouldn't have to do this, in that it can probably assume that 
>> it knows what the server changed, and make the same edit itself.  But 
>> a generic text editor would not.
> 
> Actually, the Subversion client wouldn't know the contents anyway,
> since only the server knows the variable replacement contents even
> for something as simple as ID.  That is why etag should not be sent
> in that case, and the result works with any client. Even if the

Not returning an ETag surely will always work, because the source of 
confusion is eliminated. But instead, you now have to resolve the lost 
update problem, for which the ETag previously would have been the solution.

> Subversion client thought it knew what the transformation would be,
> and what the Id value should be, it still wouldn't know if any other
> transformations might have occurred and thus would have to do a GET
> (or receive some other form of notification that would tell it how to
> transform its current representation to the current form).  A far more
> useful extension would be to define a new 2xx response to PUT that
> includes both the new metadata and a patch that would allow the
> client to implement a matching transformation on its own copy.
> 
> OTOH, it would also be more sensible for the Subversion client to strip
> the contents within $Id$ itself, before the PUT, and thus both client
> and server would have the same content and a strong etag could be
> returned.  In other words, the client uses its knowledge of the resource
> definition to remove the unnecessary bits at the application layer
> above the HTTP processing, thus making the HTTP interaction more
> efficient.

Well, all of this isn't necessary if the server does return the strong 
ETag, and the client just keeps authoring that resource.

> I have no idea what the current Subversion client does in that situation,
> so this is all hypothetical.
> 
>>   I do agree that many client authors are all bent out of shape about 
>> the data changing when they needn't be, and it's probably best if they 
>> don't refresh the data unnecessarily, but I do think there are valid 
>> use cases for having that information.
>>
>>> My proposal does solve every single case that has been described so far,
>>> including repeatable server-modifications such as normalization, and
>>> works with existing clients.
>>
>>   Is the above use case covered?  I'm not seeing how.
> 
> Yes.  A strong ETag in a PUT result means that the entity has not been
> transformed in an octet-significant way.  No strong ETag in a PUT result
> means that it may have been transformed and thus a refresh is necessary
> to obtain both the current form and the current ETag.

I don't think this is was RFC2616 says, and what (some) servers do 
today. And even if we could simply ignore that, this approach would make 
authoring of resources on content-rewriting servers hard, because you 
wouldn't be able to use ETags anymore (that is, you would always have to 
refresh the local copy, although it may be completely unneeded).

>>   The ETag this has obviously caused a lot of heartburn.  I'd be 
>> interested to know whether you think the rest of the draft is good, 
>> absent the new header.  It's clear that some clarification is needed, 
>> and I think this is a good start.  No?
> 
> 2616 already says it, just not in a way that is readable by anyone
> (including the original authors).  The draft wastes a great deal of
> text justifying various ways that people have misread 2616 and then

OK, it would be really helpful if you could point out what actually was 
misread.

> proposes an extension header that doesn't solve any real problem
> and adds the very significant problem that no current implementation
> is prepared to deal with a new response header that isn't metadata.

Many protocols have defined new response headers that aren't metadata, 
and I'm not sure what problem that causes in practice. This being said, 
the value of Entity-Transform could actually be seen as metadata, and 
also uses a format which makes it impossible that stale copies of the 
header cause harm in the client.

> I'd rather just fix the problematic text with a clarification.
> It explains the intended function of etag without really changing
> anything in the protocol -- it just documents an undocumented assumption.
> 
> The single sentence I supplied is sufficient to solve the problem
> of clients not knowing whether or not they need to do a GET after
> a PUT, and answering it solves the question of interoperability.

As far as I can tell, this has the price of making it incompatible with 
RFC2616, and also defies efficient ETag-based authoring on servers that 
do rewrite the content.

> If we want to go further and actually grapple with the feature that
> clients would really like to have -- namely, the content of the
> transformed bits without an extra round trip -- then half-measures
> like Entity-Transform are not good enough.  That feature requires the
> transformation to be sent in the response, which would mean a new
> response status code and associated semantics.
> 
> ....Roy

Which would be interesting, in particular in combination with PATCH.

Going back to a previous mail:

>> On Nov 28, 2006, at 4:27 PM, Julian Reschke wrote:
>>
>>> The Xythos client always assumes that content isn't rewritten. And if no ETag is returned, it uses the Last-Modified date as cache key. So it's already broken with respect to servers that have to rewrite.
>>>
>>> Roy never made a proposal, and didn't answer when asked for clarification/confirmation.
> 
> I don't like repeating myself over and over again just because you
> want to add an unnecessary feature to HTTP.

Roy, nobody asked you to *repeat* yourself. I was asking because I 
wasn't sure I understood what you said, and it turns out I in fact did not.

> My opinion hasn't changed -- the extension is completely unnecessary,
> RFC 2616 already defines what sending an etag on PUT means,
> and every single relevant case can be handled by a simple clarification.
> 
> My proposal was:
> 
>     require that ETag in a response to PUT means that the client
>     can use that entity tag in future conditional requests (that
>     includes IMS, If-Match, and Range-based conditional requests).

And the ambiguity here was that it's not clear what "can use" actually 
means.

In my server I can return a strong ETag although the content was 
rewritten, and that ETag can be used both for PUT/If-Match (PUT will 
succeed), GET/If-None-Match and in GET/If-Match+Range (server will 
return the full content because it knows the ETag was sent back upon PUT).

> There is absolutely no reason for the server to send an ETag on a PUT
> response if the server does not intend to support that functionality.

So if it does support that functionality, can it return the strong ETag 
even though content was rewritten?

> I am quite sure that none of the server vendors will care about the
> addition of that clarification provided that the requirement is
> properly stated as an interface constraint and not an implementation
> constraint.
> 
> OTOH, the mechanism for how the server "stores" the representation, whether
> or not it is stored byte-for-byte, and how a server might otherwise manage
> to accomplish the feat of handling a conditional request without preserving
> byte-level equality is none of our business and should never appear in an
> IETF specification regarding HTTP.

Yes. Did anybody suggest that?

> My proposal does solve every single case that has been described so far,
> including repeatable server-modifications such as normalization, and
> works with existing clients.

At this point (sorry!) I am still not sure what your precise proposal 
is. Does it require the ETag to be usable as expected, or does it 
require not to return an ETag when the server did rewrite the contents? 
This mail and the mail above seem to be contradictory with respect to this.

> In regard to Apache mod_dav's use of weak entity tags, that is a separate
> issue and can be fixed without changing HTTP at all.  It is merely an
> implementation quirk having to do with the way mod_dav reuses the
> file space as a storage mechanism (and thus a separate handler for GET).
> I can very easily turn it off, as can anyone with access to the config
> files.  The right solution, though, is to use a property-based back-end
> and store the MD5 on write, which would allow the weak designation to be
> removed and actually solve the real problem of clients wanting a strong
> etag returned from PUT.  Patches are welcome.
> 
> ....Roy

Last time I checked, the ETag handling was completely independent of 
Apache/moddav.

Best regards, Julian
Received on Saturday, 2 December 2006 21:00:49 UTC