RE: Summary of ETag related issues in RFC2518bis from Dan Brotsky on 2005-12-21 (w3c-dist-auth@w3.org from October to December 2005)

From: Dan Brotsky <dbrotsky@adobe.com>
Date: Tue, 20 Dec 2005 22:33:37 -0800
To: "Lisa Dusseault" <lisa@osafoundation.org>
Cc: <w3c-dist-auth@w3.org>, "Geoffrey M Clemm" <geoffrey.clemm@us.ibm.com>
Message-ID: <E1F796B37FB8544FA09F6258E7CED3BB4B9358@namail3.corp.adobe.com>
As to your last question: Yes it's OK and no the server needs to break
the lock if it does this (because it's indistinguishable from another
client's edit).  Not all clients will work efficiently against servers
that unexpectedly munge data after PUTs are complete but  that's life.

    dan

> -----Original Message-----
> From: Lisa Dusseault [mailto:lisa@osafoundation.org] 
> Sent: Tuesday, December 20, 2005 10:01
> To: Dan Brotsky
> Cc: w3c-dist-auth@w3.org; Geoffrey M Clemm
> Subject: Re: Summary of ETag related issues in RFC2518bis
> 
> So when is it OK for a client doing multiple edits to do 
> several PUTs in a row without intermediate GET requests?
> 
> I'll take the example of a source code file which is being 
> edited, and the source control server expands keywords in 
> comments in the file.  
> The client software could issue a PUT and get back an ETag, 
> then when later changes are made, issue another PUT with 
> If-Match with the ETag 
> just received.   Each time, the server could expand the keywords, 
> without harm done from the client "losing" the server's changes.
> 
> In this case, multiple PUT without intermediate GET is OK to 
> do -- the server is prepared to make the same changes on each 
> PUT and doesn't really need the client to re-synchronize 
> their changes.  There are other cases like some CalDAV cases 
> where the server adds an internal event identifier or 
> alternate address to the event.  I'd also bet that there are 
> clients that already do multiple PUT requests without 
> intermediate GETs especially if the client holds a lock.  But 
> if in some cases the server needs the client to do a GET 
> between two subsequent PUTs because the changes are important 
> to preserve, how can the server accomplish that?
> 
> I believe there is a way without any additional mechanisms:
> 
>   - If the server is making changes that can be overwritten 
> without harm, or if the server is making no changes, it can 
> return an ETag in response to PUT and the client doesn't have 
> to do a GET unless it later sees a different ETag.
> 
>   - If the server is making changes that must be preserved, 
> then the server can respond to the initial PUT with a 
> throwaway ETag, then immediately update the ETag of the 
> resource to a new and more permanent value.  Now the client 
> will be forced to recognize that there are new changes to be 
> synched -- just as if another client had made the change in 
> that period of time.  Most clients would already be compliant 
> with this.
> 
> If we decided to make this kind of recommendation, we'd also 
> have to specify whether it's OK to do this while the client 
> is holding a LOCK.
> 
> Lisa
> 
> On Dec 19, 2005, at 9:09 PM, Dan Brotsky wrote:
> 
> >
> > Geoff,
> >
> > I don't follow your reasoning here when you say "the client will 
> > incorrectly conclude that the text it sent with the PUT is 
> what would 
> > be retrieved by the GET."  It seems like there are three
> > cases:
> >
> > 1. The server modifies the value "on the way up", that is, before 
> > returning from the PUT.  (This is typically how a version control 
> > system would expand keywords, as part of the checkin.)  In 
> this case 
> > the value that would eventually be retrieved by GET is 
> known and thus 
> > its etag can be returned, even if that etag is a timestamp.
> >
> > 2. The server returns before modifying the value, but knows that it 
> > will do so.  In this case a synthetic value for the etag can be 
> > generated and returned, as long as the server takes steps 
> to make sure 
> > that etag is returned with the eventual GET and all GETs requested 
> > before the modifications are complete are blocked (e.g., 
> with "server 
> > busy").
> > This
> > etag can still be a timestamp, by the way, and can even be 
> a timestamp 
> > of the checkin, as long as the server associates that time with the 
> > eventual result (which version control systems also typically do).
> >
> > 3. The server returns before modifying the value, and doesn't know 
> > that a modification will take place.  (For example, the 
> "type" of the 
> > file is later changed so that the file undergoes keyword expansion 
> > later.)  In this case, at the time the file is modified by 
> the server, 
> > it should assign a new etag, because indeed the etag 
> returned at the 
> > time of the PUT should not match what a client would 
> eventually GET.  
> > But before that later modification is done, the etag is correct.
> >
> > In no case does a client ever assume that "the text it sent 
> with the 
> > PUT is what would be retrieved by the GET."  That's not 
> what the etag 
> > is for.  The etag is to reassure the client that the value on the 
> > server *has not changed since the PUT completed*.  No 
> guarantees are 
> > issued that the value doesn't change as part of the PUT; 
> that would be 
> > a part of the PUT semantics for that server and are outside 
> the scope 
> > of WebDAV.
> >
> >     dan
> >
> >
> >
> > ________________________________
> >
> > 	From: w3c-dist-auth-request@w3.org
> > [mailto:w3c-dist-auth-request@w3.org] On Behalf Of Geoffrey M Clemm
> > 	Sent: Monday, December 19, 2005 19:47
> > 	To: w3c-dist-auth@w3.org
> > 	Subject: Re: Summary of ETag related issues in RFC2518bis
> > 	
> > 	
> >
> > 	Jim:
> > 	
> > 	What about the point made by an earlier poster, namely that
> > 	a server is allowed to modify the content stored by a PUT,
> > 	so that a GET following the PUT might return different content
> > 	than was PUT (the earlier poster gave the example of a server
> > 	that expands RCS keywords on PUT).
> > 	
> > 	In this case (i.e. the server modifies the content stored by
> > 	the PUT), if server returns the etag that would be returned
> > 	on a GET, and the client requests a GET with an If-None-Match
> > 	header with the etag returned by the PUT, the client will
> > 	incorrectly conclude that the text it sent with the PUT is
> > 	what would be retrieved by the GET.
> > 	
> > 	So unless we are going to disallow servers from modifying the
> > 	content stored from a PUT (note that our server does 
> not do this,
> > 	so I am speaking as a neutral party here :-), we pretty much
> > 	have to have PUT return the entity tag of the content that was
> > 	PUT, not what would be returned by the GET.
> > 	
> > 	Then a client that wants to continue modifying a resource to
> > 	which it has just done a PUT, would need to do a GET with
> > 	an If-None-Match call following the PUT, to handle servers
> > 	that do this kind of rewriting on PUT.
> > 	
> > 	Note that this is just a single GET, not to be confused with
> > 	the "polling" scenario described in "promotion from weak to
> > 	strong etag" thread.
> > 	
> > 	Cheers,
> > 	Geoff
> > 	
> > 	
> > 	Jim wrote on 12/19/2005 09:11:02 PM:
> > 	>
> > 	> Julian,
> > 	>
> > 	> Thanks for making this more clear -- you're right, there is a
> >
> > 	> significant issue here.
> > 	>
> > 	> > The question here is whether an ETag returned upon 
> PUT is for the
> > 	> > entity the client sent (1), or for the entity the 
> server would 
> > send
> > 	> > upon a subsequent GET (2).
> > 	> >
> > 	> > There are cases where both will not be the same, so 
> this needs to
> > 	> > be clarified. In case of (2), a client will need a 
> subsequent GET
> > 	> > if it's planning to use the ETag for subsequent GET/Range 
> > requests.
> > 	> >
> > 	>
> > 	> I think option #2 is the best one here (the Etag 
> returned by PUT is
> > 	> the one a subsequent GET would retrieve).
> > 	
> > 	
> >
> >
> 
>
Received on Wednesday, 21 December 2005 06:33:53 UTC