Thoughts on writing standards that real clients can support

So spoke Geoffrey M. Clemm:
> A server is free to refuse a move request it cannot honor for any
> reason, whether because it cannot move locks, or because it cannot
> perform cross-server moves, or whatever.  In the former case, the
> client unlocks and retries the move.  In the latter case, the client
> tries a copy/delete if that is OK.  Easy for server to detect and
> implement, well defined error codes, simple client responses.

For the edification of the working group I am now going to give you a peek
into the mind of someone who creates File dialogs to support WebDAV in
Office: (With apologies to my friends in Office, you know who you are =)

"Well, um... uh... coffee? Yes, have coffee... Sugar? Wait where is the
sugar? Oh.. here it is. O.k. Um.. Move... there is no cross-server protocol
so any MOVE requests across namespaces is probably always going to fail.
O.k. fine, don't allow the UI to support cross-server moves. Um... what
about single namespaces physically split between multiple services? You
know, http://foo/bar/ is on one machine but http://foo/ack is on another.
Ahh screw 'em. They have no business making namespaces like that visible to
the user or at least they should figure out how to connect them. If we get
an error trying to do a MOVE in that case then put up a catastrophic error
dialog telling the user that their server is completely busted and they need
to complain to the server manufacturer. Just DON'T CALL US. 
	The support bill for the protocol for the last release was
unbelievable. Every time the screwy servers did something we got the damn
call! If I can't figure out how to reduce the support costs we are going to
have to kill support for the protocol. Oy... 
	O.k. um... hum.... some servers will let me MOVE a locked file and
um... some won't. Hum... o.k. um.. pick one. Flip coin. Heads, o.k. I will
assume that all servers can support moving a locked file, if one fails then
I will treat it as a catastrophic failure and put up a big warning telling
the user that their server is a piece of garbage."

This is the life of a client. This is why a good standard only uses "MUST"
and why a bad standard uses "SHOULD" and "MAY". The argument that "we give a
good failure indication to the client so we can allow the server to do
whatever it wants" is a 100% reliable sign that you are creating a crappy
standard. You pick one default behavior and anything that isn't that
behavior should generally be a catastrophic failure. In this case this means
that ALL servers MUST either support moving locked files or MUST NOT support
moving locked files.

One may somewhat reasonably but very naively ask "well why can't we just say
that clients SHOULD NOT expect that locked files can be moved but if a
server wants to allow it, so much the better?" Now put yourself in the shoes
of the client maker. What are his choices? He writes his code, it is going
to have to make some assumption as the default case. Does he always assume
he can MOVE the file if it is locked? If he does then he has to be ready for
a "non-catastrophic" failure (meaning - sorry, I don't support that feature)
and write a back up code path which can do the unlock followed by a MOVE
trick. This means he has two code paths he has to develop and test. 
	A common mistake I see in the IETF is people thinking that the hard
part is writing the code, I have lost count of the number of times I have
heard "why are your complaining, it is ten lines of code?" It may very well
be ten lines of code, but it is ten lines of code with two alternate paths
both of which have to be tested. It isn't the code that kills you, it is the
testing. In the end the client maker will just choose one path and 5 lines
of code so that the number of test cases gets cut in half and he has a
chance of actually shipping on time. 

A second problem with the "but the server can always say no" philosophy of
protocol design is that clients depend on failure conditions. That is,
clients will depend on MOVEs failing if a file is locked. For example, a
client maker will allow users to move files in the file dialog depending on
the fact that if other users are using files in that same tree the files
will be locked so that the first user's MOVE commands will fail on those
files.

Put another way, a good standard provides both clear success and clear
failure conditions.

Saying "The server MAY move a file that is locked" fails the "clear success
condition" test because a client won't know, baring catastrophic failure, if
a MOVE on a locked file should work or not. This means that the client has
to write two code paths which have to be tested. It means the client has to
write two UIs, which have to be tested. This generally also ends up screwing
up the client's APIs as well since a MOVE on a locked file could get an
error both from moving the locked file and possibly from the unlocked/move
pair that is used if the server doesn't supporting moving locked files.

Saying "We will just tell clients that servers generally don't move locked
files but we will let servers move locked files so that future more advanced
clients can take advantage of the feature" fails the "clear failure
condition" test. Either a MOVE on a locked file is supposed to succeed and a
failure is an unrecoverable catastrophic event or a MOVE on a locked file is
required to always fail. In either case the client has a single code path/UI
design/API to code and test. But saying "maybe it works, maybe it doesn't"
means that code/UI/API all have to be doubled.

It is hard, very hard, painful in fact, to have to cut a feature which seems
so reasonable. But it is only in making these hard decisions and really
constraining the hell out of the defined functionality that you get
interoperability. Of course the art to good protocol design is to know what
not to define, what functionality to not include in the standard. That way
you leave yourself ways out in the future.

This is all summarized in the two maxims I put up before:

	1) The spec is done when there is nothing left to cut.
	2) There are only two types of features, mandatory and not in the
spec.

			Yaron

P.S. Although Jim Whitehead didn't help write this post he shares a lot of
guilt for its contents. The philosophy explained here was developed by he
and I in a series of arguments over the question of "well the server can
always say no". This p.s. is added as my bit to help ensure that blame is
properly apportioned.

Received on Sunday, 26 September 1999 03:30:41 UTC