RE: Some problems with the WebDAV protocol (part 1: philosophy) from Jim Whitehead on 1999-04-25 (w3c-dist-auth@w3.org from April to June 1999)

From: Jim Whitehead <ejw@ics.uci.edu>
Date: Sun, 25 Apr 1999 14:08:30 -0700
To: Yoram Last <ylast@mindless.com>
Cc: WEBDAV WG <w3c-dist-auth@w3.org>
Message-ID: <001001be8f5f$c4f66620$d115c380@ics.uci.edu>
Yoram,

Thank you for taking the time to write this post.

> It seems to me that some of your bogus (in my mind, at least) arguments
> below are the result of a misinterpretation of what the HTTP protocol
> is and how it works. I would thus like to start with a fairly general
> description.

Your rebuttal (towards the bottom of your post) really didn't need this bit
of HTTP philosophy to make your point.  But, since you did, I've decided to
reply to just to philosophy part of your post before I reply to the less
philosophical parts.  And your discussion of HTTP philosophy does help me
understand where you're coming from.

> But, even though you know all that, the request itself doesn't tell you
> anything about the nature of the action that will be taken in response to
> it, nor about the kind of response that would be returned.

Actually, the definition of GET in HTTP/1.1 does tell you several things
about "the nature of the action that will be taken".  For example, in
Section 9.1 or RFC 2068, it states that GET can always be assumed to be
idempotent, and that GET should never take an action other than retrieval,
that it is a "safe" method.

> If you are issuing this request, you should have had some prior
> knowledge concerning the nature of that resource (like the fact that it
> is a CGI program capable of processing certain parameters).

This is absolutely false.  A user-agent is not required to know anything
about the nature of the resource to which it is submitting a GET request.

> Moreover, even those things that you
> did know about the URI, are completely outside the scope of the HTTP
> protocol.

This is because, except for a few characters defined in RFC 2396 ("Uniform
Resource Identifiers (URI): Generic Syntax"), a URI is an opaque string.

> All of these things are completely outside of its scope, because HTTP is
> mostly a communications protocol. It specifies how to *submit* requests
> and how to *send* responses.

HTTP falls under the general class of client-server remote procedure call
protocols.  In order to specify how clients and servers interoperate, these
protocols necessarily have detailed information on how to submit requests
and send responses to those requests.  This is just as true of HTTP as it is
of SMTP, LDAP, IMAP, ACAP, etc.

> Not how requests should be handled or what kind of responses should be
> provided. This nature of HTTP is what makes it such a flexible protocol.

Interpreted literally, these sentences would appear to be easily
contradicted by the definitions of methods in section 9 of RFC 2068.

> It enables many things to be layered on top of it. The actual way requests
> are being handled depends on the server. It may or may not involve other
> standards such as CGI, but in any event, it is mostly outside the scope
> of HTTP. Now the fact that the HTTP protocol doesn't know what collections
> and query strings are, does not mean that clients and servers and users
> don't know what they are. These objects may (even if they don't need to)
> fully exist in the context of HTTP based communication in pretty much the
> same way that URIs exist in the context of TCP/IP based communication.

It seems to me that you're trying to answer the question, "is there some
property of HTTP that has led to its success", and in particular, "is there
something about the looseness of the definitions of HTTP methods that has
contributed to its success."

These are good questions to pose of HTTP.  I like Roy's hypothesis: what
makes HTTP unique, and distinguishes it from other RPC protocols, is that it
uses "representational state transfer".  Representational state transfer is
a long term for the notion that a GET on an HTTP resource doesn't return the
persistent state of the resource, but rather it returns a representation of
the persistent state of the resource. This representation does not have to
have a 1:1 correspondence with the persistent state of the resource, in fact
it can vary dramatically, as in the case of CGI, or other dynamic content.
This concept of representational state transfer underlies the language in
GET that states, "the GET method means return whatever information (in the
form of an entity) is identified by the Request-URI."

This definition could have been phrased, "return the exact sequence of
octets associated with the Request-URI", but this would have ruled out the
CGIs and other dynamic content which have contributed to the use of the Web
to provide services as well as content.  So in this case, the definition of
GET having been loosened just a little bit to accomodate representational
state transfer provided significant benefits: the ability to have
intelligent services and dynamic content on the Web.

> Now lets look at PUT and DELETE. These methods are not as flexible as
> GET and POST, but the general principles apply here just as well.

Well, since you can tunnel an entire new protocol through POST (e.g.,
Internet Content Exchange, and XML-RPC), I would hope that PUT and DELETE
are a *little* less flexible. :-)

> HTTP
> specifies what it does about PUT, and beyond that it is up to the server
> implementation to determine what it will or will not do. A server may
> implement some crazy CGI-based mechanism to enable entities to be PUT
> into URIs that contain query strings, and it may, just as well, create
> new collection resources in the process of creating a new resource URI.
> Similarly for DELETE: HTTP describes this method as requesting to
> "delete the resource identified by the Request-URI". Since HTTP doesn't
> say what a resource is, nor does it distinguish between different types
> of resources, nor does it say if other resources should or should not
> be deleted (or maybe even created) in the process of deleting a given
> resource, all of these things (and more) are left to be decided by the
> server implementation. Furthermore, HTTP even allows for the server
> implementations to have extra control mechanisms, so that the actual
> deletion may be postponed or even canceled at a later time.

Unlike the definition of GET, which is sufficiently open to allow dynamic
content to be returned, yielding a net positive effect on the Web, the loose
definitions of PUT and DELETE trade off interoperability for no net positive
effect.  It's all loss -- having loose definitions of these methods doesn't
gain you new, compelling use environments, or new compelling services, they
just result in interoperability problems. You extoll the virtues of clients
needing to have out-of-band information about a server in order to know
exactly what the effect of a method will be. This is not a goodness, because
it assumes a human is always in control of the client, and humans can have
mistaken assumptions.

> Now this flexibility in HTTP has its consequences, and one of the main
> consequences concerning DELETE, in particular, is that, in order to use
> it safely and effectively, a client would need to know how it is
> implemented by the server. The fact that a given HTTP server supports
> the DELETE method, does not provide enough information to determine
> the behavior of that method on this server. This is quite similar to
> the CGI example above. You need to have additional information (other
> than the fact that HTTP is being used as the communication protocol)
> in order to be able to predict the possible outcomes of essentially
> any kind of HTTP request. Now the issue concerning the supposed allowance
> of a failed DELETE to be reported as success is fully addressed within
> this framework, because it simply means that the success of a DELETE
> should be interpreted in the context of any particular server
> implementation. On some servers, a successful DELETE means "the file
> was deleted", on others it might mean "the file will be deleted
> tomorrow morning if my boss approves it", and yet on others it might
> mean "the file was marked for later deletion; while it is still
> available for GET requests, no further editing of it can be done",
> but in each of these cases success means success in the context of
> that implementation, and failure means that the request failed. In
> most practical cases, clients (or more precisely users) should be
> familiar with the servers they work with, and they would know the
> real (or practical) meaning of a successful DELETE, and it is
> different from failure.

Do you need to know how your FTP server is implemented in order to use it?
How about your POP server?  What you are reporting are the symptoms of a
poorly specified method, not some glowing example which should be held up
for all to gaze at and admire.

> Now you may or may not like how HTTP works, and I would fully agree
> with you if you say that it is not optimally tuned for content
> management. But that's what it is, and it has its own advantages and
> disadvantages.

The current definition of DELETE is bad, and has no advantages.  In my
personal opinion, it should have been left out of the HTTP spec., along with
PUT.

> WebDAV, on the other hand, is a totally different protocol with totally
> different goals and totally different design philosophy.

WebDAV is an extension of HTTP, thus it is not a *totally* different
protocol from HTTP. Since HTTP has PUT and DELETE, presumably remote
authoring was a goal of HTTP, and hence WebDAV does not have *totally*
different goals from HTTP.  And from my perspective, I don't feel that I
helped develop a protocol which had a totally different design philosophy
from HTTP.  So I don't agree with your assertion :-)

> It deals with
> defining a great deal of server-side structure, and specifying a great
> deal about how requests should be handled and responded to, and
> it leaves very few things open to interpretation or handling by other
> protocols.

As stated before, I view the concreteness of the language in the WebDAV
spec. to generally be a plus.  A client can execute a DAV method against a
DAV server and by examining the response, have a reasonably clear
understanding of what just happened on the server.  This is a good thing.

> It is certainly not a communication protocol.

I really don't understand what you mean when you say "communication
protocol" -- the term really can have lots of possible meanings. Did you
make up this term, or are you using some standard reference's definition of
the term (if so, perhaps a citation would help me understand where you're
coming from).

> This is nice
> and legitimate (except for those aspects of it that are flawed), but
> you can't interpret HTTP as being the same thing as WebDAV. WebDAV
> could have been designed as being purely layered on top of HTTP as it
> is. The fact that you *choose* to design it through HTTP extensions does
> not make these two different protocols the same. (which is all the more
> reason why it was a bad design error to put them in conflict.)

I'm afraid I don't understand this paragraph.

- Jim
Received on Sunday, 25 April 1999 17:10:20 UTC