RE: GET, POST, and side-effects from Paul Leach on 1996-01-04 (http-caching-historical@w3.org from January 1996)

From: Paul Leach <paulle@microsoft.com>
Date: Thu, 4 Jan 96 13:08:33 PST
To: http-caching@pa.dec.com, mogul@pa.dec.com
Message-Id: red-16-msg960104210304MTP[01.52.00]000000b0-5229
----------
] From: Jeffrey Mogul  <mogul@pa.dec.com>
] To:  <http-caching@pa.dec.com>
] Subject: GET, POST, and side-effects
] Date: Wednesday, January 03, 1996 7:37PM
]
] As Shel points out,
]     There's an antique version of form submission that uses GET with
]     stuff crammed into the URL, that *can* cause server side effects.
]     It's just a convention that GET doesn't cause side effects.
]
] In Roy's draft, section 14.2 says that GET and HEAD should not
] have side effects.  It also says that the protocol cannot enforce
] this, but (implicitly) that implementations of the protocol may
] assume that GET and HEAD are in fact side-effect-free.  I believe
] that this is a valuable concept, and I don't think we should give
] up our ability to make this assumption.
]
] As for other methods, I suggest that the notion of "side effects" is
] the wrong one.  What we may want to specify is whether certain methods
] are "idempotent".  Distributed systems types use the word "idempotent"
] to mean "repeating the operation N times has the same side effects as
] repeating it 1 time, for N > 1".

This definition is too loose. An operation is idempotent if the result 
of repeating it N times is the same as doing it the only the last time. 
 Ie, denote the i'th application of operation O as O(i).  Then
	O(1)
	O(2)
should produce the same result as just
	O(2).

This allows side effects of a limited sort -- for example, the 
date-time-modified value associated with a file can be changed by 
writing it, and still have writes be idempotent.
]
] For example, POST is probably not idempotent, but PUT might be
] idempotent if properly serialized (which we may not be able to
] do, of course).

Idempotence is orthogonal with serialization. If you are going to take 
advantage of idempotence in a concurrent environment, you have to deal 
with serialization as well, but the idempotennce of a function is 
independent of whether it is used in a concurrent environment.

]
] Other methods with side effects include PATCH, COPY, MOVE, DELETE,
] LINK, and UNLINK.  Some of these are clearly not idempotent; others
] might be.  I won't consider these for now, however.
]
] [Begin: thinking-out-loud portion of message]
]
] How might one make PUT idempotent?  Well, suppose that the client
] starts by doing a GET on an existing resource, and the server
] returns a cache-validator value for the resource.  Then the client
] issues one or more PUTs on the resource, handing back the
] cache-validator value it received from the server.  If the server
] only performs the PUT when the client's validator matches the
] one that it would provide if a GET were done on the resource,
] *and* (very important) the validator is constructed in a way
] that is guaranteed to change when the resource is modified, then
] we have an idempotent method.  No matter how many intervening
] PUTs have been done, this should not result in the same PUT
] being done twice.

I'm confused.  A PUT which overwrites the resourced indentified in the 
request URI with the entity provided in the PUT request *is* 
idempotent, without any of the extra "validators".

]
] In other words, we have a "conditional PUT".  But unlike a
] conditional GET, which performs the full operation only
] if the validators do not match, the conditional PUT performs
] the full operation only if they *do* match.
]
] If the resource did not exist before the PUT, the client could
] supply a special "null" validator which is guaranteed not to
] match anything.  The server would allow this kind of conditional
] PUT only if the resource doesn't already exist.
]
] Why is this important?  Suppose that the client performs a PUT via
] a proxy, and the server updates the resource and returns a 200 OK
] to the proxy, and closes the connection.  But the TCP connection
] between the proxy and the client fails before the client receives
] the 200 OK.  If the client simply retries the PUT, it may overwrite
] an intervening PUT done by another client.  But if this "conditional
] PUT" approach is taken, then the retry cannot cause this erroneous
] result.

Why shouldn't it overwrite an intervening PUT?  You're assuming a 
concurrency control semantic that isn't required in all cases. *If* 
there is some concurrency control to prevent write-write conflicts, 
then the natural idempotence of PUT will allow retries. If there is no 
such concurrncy control, because the semantics of the store don't 
require it (e.g.,analagous to the UNIX file system), then overwriting 
an intervening PUT is OK.

Please note: I'm not arguing against a concurrency control scheme to 
allow reliable GET/PUT sequences -- just saying that its a concurrency 
control scheme, not a way of making PUT idempotent.

]
] It could return OK or it could return an error status, and it might be
] hard for the client to figure out whether the error status came because
] the first PUT had succeeded or because some other client had updated
] the resource first.  But I suppose the client could simply do another
] GET to see if the right PUT had been done.
]
] But wait: couldn't one do conditional POSTs the same way, using
] the cache-validator of the original (un-posted-to) resource?  This
] would protect against races from other clients, for example.
]
] What does this have to do with caching the results from POSTs and
] PUTs?  I'm not sure.  Perhaps nothing.  But it might be worth
] trying to think through a way for the server to tell a cache that
] the entity-body supplied with a PUT request, possibly taken with
] some headers returned by the server, can be treated as a cached
] copy of the PUTted resource (because PUT replaces the resource,
] rather doing any partial modification).  This would avoid a
] subsequent reload from the server on a subsequent GET of that
] resource.

Yes.But I don't see what avoiding races has to do with cachability. 
What you need back from a PUT or POST to enable you to cache is some 
indication that a subsequent get of the URI will fetch the same entity 
value that was PUT or that was returned by the POST *until modified by 
someone else*.  This is no more nor less what is implicit in GETs.  
There's no gurantee that a resource fetched with GET won't be subject 
to a race -- but the server says, via Expires: and Cache-Control: 
whether and how long a cache can assume that it won't be -- the exact 
same mechanisms could work for PUT and POST.

Paul
Received on Thursday, 4 January 1996 21:26:02 UTC