Re: Etag-on-write, draft -04

On Nov 30, 2006, at 9:41 PM, Wilfredo Sánchez Vega wrote:
>   I also think that there are clients in the world that would like  
> to know that if they did a an immediate subsequent GET of the same  
> variant, that the data would look different.  Several client  
> authors have expressed interest in this information, which is what  
> the Entity-Transform is meant to address.

Which, again, is why it is completely unnecessary.  If the entity being
edited had an etag, and the result from performing a PUT did not include
an etag (or included only a weak etag), then the client can conclude  
that
the data would look different.  Whether or not it does happen to be
different when a subsequent GET is made is a distributed systems problem
that won't disappear, regardless of Entity-Transform.  The client should
be assuming a transform has occurred if it does not receive a strong  
etag,
since it has no alternative available to it other than to perform a
subsequent GET.  The Entity-Transform provides no useful information
and may, in fact, be completely wrong if there is an intermediary in
the middle doing transformations of its own -- the spec already
anticipated that for etag, but not for future extension headers.

>   Example A.1 describes a Subversion server.  Let's say that the  
> client isn't the Subversion client, but a text editor, and we have  
> auto-versioning enabled on the server.
>
>   My editor saves a document with "$Id$" tags in it.  The server  
> says OK and gives me an ETag.
>
>   You are arguing, I think, that my editor has no reason to care  
> that the $Id$ tag portion was modified, since subsequent edits will  
> result in the same entity on the server, regardless of whether it  
> refreshes the modified $Id$ tag in the document, so a clarified  
> ETag should be sufficient.

Or simply not returning the etag at all, which tells the client it needs
to obtain the new representation that contains the current Id info.

>   This is probably true for a lot of editing situations.  In the  
> case of normalization of, say, iCalendar data, I think you are right.
>
>   However, in the Subversion example, I think the edit does care,  
> because the point of the $Id$ tag is that the user can see what  
> version of the document they are looking at when they view the  
> file.  In this case, knowing that the data was modified will let  
> the editor know that a subsequent GET request would be a good  
> idea.  A Subversion client wouldn't have to do this, in that it can  
> probably assume that it knows what the server changed, and make the  
> same edit itself.  But a generic text editor would not.

Actually, the Subversion client wouldn't know the contents anyway,
since only the server knows the variable replacement contents even
for something as simple as ID.  That is why etag should not be sent
in that case, and the result works with any client. Even if the
Subversion client thought it knew what the transformation would be,
and what the Id value should be, it still wouldn't know if any other
transformations might have occurred and thus would have to do a GET
(or receive some other form of notification that would tell it how to
transform its current representation to the current form).  A far more
useful extension would be to define a new 2xx response to PUT that
includes both the new metadata and a patch that would allow the
client to implement a matching transformation on its own copy.

OTOH, it would also be more sensible for the Subversion client to strip
the contents within $Id$ itself, before the PUT, and thus both client
and server would have the same content and a strong etag could be
returned.  In other words, the client uses its knowledge of the resource
definition to remove the unnecessary bits at the application layer
above the HTTP processing, thus making the HTTP interaction more
efficient.

I have no idea what the current Subversion client does in that  
situation,
so this is all hypothetical.

>   I do agree that many client authors are all bent out of shape  
> about the data changing when they needn't be, and it's probably  
> best if they don't refresh the data unnecessarily, but I do think  
> there are valid use cases for having that information.
>
>> My proposal does solve every single case that has been described  
>> so far,
>> including repeatable server-modifications such as normalization, and
>> works with existing clients.
>
>   Is the above use case covered?  I'm not seeing how.

Yes.  A strong ETag in a PUT result means that the entity has not been
transformed in an octet-significant way.  No strong ETag in a PUT result
means that it may have been transformed and thus a refresh is necessary
to obtain both the current form and the current ETag.

>   The ETag this has obviously caused a lot of heartburn.  I'd be  
> interested to know whether you think the rest of the draft is good,  
> absent the new header.  It's clear that some clarification is  
> needed, and I think this is a good start.  No?

2616 already says it, just not in a way that is readable by anyone
(including the original authors).  The draft wastes a great deal of
text justifying various ways that people have misread 2616 and then
proposes an extension header that doesn't solve any real problem
and adds the very significant problem that no current implementation
is prepared to deal with a new response header that isn't metadata.
I'd rather just fix the problematic text with a clarification.
It explains the intended function of etag without really changing
anything in the protocol -- it just documents an undocumented  
assumption.

The single sentence I supplied is sufficient to solve the problem
of clients not knowing whether or not they need to do a GET after
a PUT, and answering it solves the question of interoperability.
If we want to go further and actually grapple with the feature that
clients would really like to have -- namely, the content of the
transformed bits without an extra round trip -- then half-measures
like Entity-Transform are not good enough.  That feature requires the
transformation to be sent in the response, which would mean a new
response status code and associated semantics.

....Roy

Received on Saturday, 2 December 2006 06:01:12 UTC