GET, POST, and side-effects from Jeffrey Mogul on 1996-01-04 (http-caching-historical@w3.org from January 1996)

From: Jeffrey Mogul <mogul@pa.dec.com>
Date: Wed, 03 Jan 96 19:37:04 PST
To: http-caching@pa.dec.com
Message-Id: <9601040337.AA06118@acetes.pa.dec.com>
As Shel points out,
    There's an antique version of form submission that uses GET with
    stuff crammed into the URL, that *can* cause server side effects.
    It's just a convention that GET doesn't cause side effects.

In Roy's draft, section 14.2 says that GET and HEAD should not
have side effects.  It also says that the protocol cannot enforce
this, but (implicitly) that implementations of the protocol may
assume that GET and HEAD are in fact side-effect-free.  I believe
that this is a valuable concept, and I don't think we should give
up our ability to make this assumption.

As for other methods, I suggest that the notion of "side effects" is
the wrong one.  What we may want to specify is whether certain methods
are "idempotent".  Distributed systems types use the word "idempotent"
to mean "repeating the operation N times has the same side effects as
repeating it 1 time, for N > 1".

For example, POST is probably not idempotent, but PUT might be
idempotent if properly serialized (which we may not be able to
do, of course). 

Other methods with side effects include PATCH, COPY, MOVE, DELETE,
LINK, and UNLINK.  Some of these are clearly not idempotent; others
might be.  I won't consider these for now, however.

[Begin: thinking-out-loud portion of message]

How might one make PUT idempotent?  Well, suppose that the client
starts by doing a GET on an existing resource, and the server
returns a cache-validator value for the resource.  Then the client
issues one or more PUTs on the resource, handing back the
cache-validator value it received from the server.  If the server
only performs the PUT when the client's validator matches the
one that it would provide if a GET were done on the resource,
*and* (very important) the validator is constructed in a way
that is guaranteed to change when the resource is modified, then
we have an idempotent method.  No matter how many intervening
PUTs have been done, this should not result in the same PUT
being done twice.

In other words, we have a "conditional PUT".  But unlike a
conditional GET, which performs the full operation only
if the validators do not match, the conditional PUT performs
the full operation only if they *do* match.

If the resource did not exist before the PUT, the client could
supply a special "null" validator which is guaranteed not to
match anything.  The server would allow this kind of conditional
PUT only if the resource doesn't already exist.

Why is this important?  Suppose that the client performs a PUT via
a proxy, and the server updates the resource and returns a 200 OK
to the proxy, and closes the connection.  But the TCP connection
between the proxy and the client fails before the client receives
the 200 OK.  If the client simply retries the PUT, it may overwrite
an intervening PUT done by another client.  But if this "conditional
PUT" approach is taken, then the retry cannot cause this erroneous
result.

It could return OK or it could return an error status, and it might be
hard for the client to figure out whether the error status came because
the first PUT had succeeded or because some other client had updated
the resource first.  But I suppose the client could simply do another
GET to see if the right PUT had been done.

But wait: couldn't one do conditional POSTs the same way, using
the cache-validator of the original (un-posted-to) resource?  This
would protect against races from other clients, for example.

What does this have to do with caching the results from POSTs and
PUTs?  I'm not sure.  Perhaps nothing.  But it might be worth
trying to think through a way for the server to tell a cache that
the entity-body supplied with a PUT request, possibly taken with
some headers returned by the server, can be treated as a cached
copy of the PUTted resource (because PUT replaces the resource,
rather doing any partial modification).  This would avoid a
subsequent reload from the server on a subsequent GET of that
resource.

-Jeff
Received on Thursday, 4 January 1996 03:42:47 UTC