Re: Some problems with the WebDAV protocol from Yoram Last on 1999-04-24 (w3c-dist-auth@w3.org from April to June 1999)

From: Yoram Last <ylast@mindless.com>
Date: Sat, 24 Apr 1999 06:08:28 +0300
To: ejw@ics.uci.edu
CC: WEBDAV WG <w3c-dist-auth@w3.org>
Message-ID: <372135AC.64C3142B@mindless.com>
It seems to me that some of your bogus (in my mind, at least) arguments
below are the result of a misinterpretation of what the HTTP protocol
is and how it works. I would thus like to start with a fairly general
description.

consider the following HTTP request:
GET /dir1/dir2/fname.ext?id=123&com=ls HTTP/1.1
Host: ...
Now if we ignore some additional information that may be contained in the
headers, this is a request which consists of two main ingredients. A method
and a URI. Surely you recognize a part of the URI as referring to a resource
(most probably, the name of a file) that is a member of a collection, which
in itself is the member of another collection, and then another portion which
is a query string. And since the request method is GET, you know that it
"should not have the significance of taking an action other than retrieval."
But, even though you know all that, the request itself doesn't tell you
anything about the nature of the action that will be taken in response to
it, nor about the kind of response that would be returned. If you are
issuing this request, you should have had some prior knowledge concerning
the nature of that resource (like the fact that it is a CGI program capable
of processing certain parameters). Moreover, even those things that you
did know about the URI, are completely outside the scope of the HTTP
protocol. HTTP itself does not specify what is a collection, nor does
it specify what is a query string, nor does it specify how a request of
this kind should be handled by the server. All of these things are
completely outside of its scope, because HTTP is mostly a communications
protocol. It specifies how to *submit* requests and how to *send* responses.
Not how requests should be handled or what kind of responses should be
provided. This nature of HTTP is what makes it such a flexible protocol.
It enables many things to be layered on top of it. The actual way requests
are being handled depends on the server. It may or may not involve other
standards such as CGI, but in any event, it is mostly outside the scope
of HTTP. Now the fact that the HTTP protocol doesn't know what collections
and query strings are, does not mean that clients and servers and users
don't know what they are. These objects may (even if they don't need to)
fully exist in the context of HTTP based communication in pretty much the
same way that URIs exist in the context of TCP/IP based communication. 

Now lets look at PUT and DELETE. These methods are not as flexible as
GET and POST, but the general principles apply here just as well. HTTP
specifies what it does about PUT, and beyond that it is up to the server
implementation to determine what it will or will not do. A server may
implement some crazy CGI-based mechanism to enable entities to be PUT
into URIs that contain query strings, and it may, just as well, create
new collection resources in the process of creating a new resource URI.
Similarly for DELETE: HTTP describes this method as requesting to
"delete the resource identified by the Request-URI". Since HTTP doesn't
say what a resource is, nor does it distinguish between different types
of resources, nor does it say if other resources should or should not
be deleted (or maybe even created) in the process of deleting a given
resource, all of these things (and more) are left to be decided by the
server implementation. Furthermore, HTTP even allows for the server
implementations to have extra control mechanisms, so that the actual
deletion may be postponed or even canceled at a later time.

Now this flexibility in HTTP has its consequences, and one of the main
consequences concerning DELETE, in particular, is that, in order to use
it safely and effectively, a client would need to know how it is
implemented by the server. The fact that a given HTTP server supports
the DELETE method, does not provide enough information to determine
the behavior of that method on this server. This is quite similar to
the CGI example above. You need to have additional information (other
than the fact that HTTP is being used as the communication protocol)
in order to be able to predict the possible outcomes of essentially
any kind of HTTP request. Now the issue concerning the supposed allowance
of a failed DELETE to be reported as success is fully addressed within
this framework, because it simply means that the success of a DELETE
should be interpreted in the context of any particular server
implementation. On some servers, a successful DELETE means "the file
was deleted", on others it might mean "the file will be deleted
tomorrow morning if my boss approves it", and yet on others it might
mean "the file was marked for later deletion; while it is still
available for GET requests, no further editing of it can be done",
but in each of these cases success means success in the context of
that implementation, and failure means that the request failed. In
most practical cases, clients (or more precisely users) should be
familiar with the servers they work with, and they would know the
real (or practical) meaning of a successful DELETE, and it is
different from failure. Furthermore, in practice, "off the shelf"
servers that support DELETE and all common Apache/CERN/NCSA
implementations that support delete (except mod_dav, I suppose)
try to delete the specified target and return success or failure
according to what really took place.

Now you may or may not like how HTTP works, and I would fully agree
with you if you say that it is not optimally tuned for content
management. But that's what it is, and it has its own advantages and
disadvantages.

WebDAV, on the other hand, is a totally different protocol with totally
different goals and totally different design philosophy. It deals with
defining a great deal of server-side structure, and specifying a great
deal about how requests should be handled and responded to, and
it leaves very few things open to interpretation or handling by other
protocols. It is certainly not a communication protocol. This is nice
and legitimate (except for those aspects of it that are flawed), but
you can't interpret HTTP as being the same thing as WebDAV. WebDAV
could have been designed as being purely layered on top of HTTP as it
is. The fact that you *choose* to design it through HTTP extensions does
not make these two different protocols the same. (which is all the more
reason why it was a bad design error to put them in conflict.)

> Yoram Last, on April 17, 1999 wrote:
> > 2) The main problem with DELETE doesn't so much effect functionality as
> > it effects compliance with HTTP/1.1 and has the potential of confusing
> > HTTP/1.1 compliant clients. In connection with DELETE for collections,
> > RFC 2518 says: "If an error occurs with a resource other than the resource
> > identified in the Request-URI then the response MUST be a 207
> > (Multi-Status)." Since 207 is not a valid HTTP/1.1 response, HTTP/1.1
> clients
> > are not supposed to be able to understand it. They are likely to consider
> it
> > as a success code (it's a 2xx) even though in this particular case it
> actually
> > indicates failure.
> 
> Yoram Last, on April 20, 1999 wrote:
> > You don't say anything about the DELETE issue. Do you think that it's a
> > kosher thing to send 207 responses that are really error responses to
> > (non-WebDAV) HTTP/1.1 clients? Some people previously suggested that since
> > HTTP/1.1 says that a 2xx response to a DELETE is not an absolute
> commitment
> > to having the resource deleted, then you are allowed to always send a 2xx
> > regardless of the outcome. This is a distorted interpretation of HTTP/1.1.
> > It says that a 2xx response to a DELETE indicates the acceptance of the
> > request as such, and that there might still be a chance that this request
> > will be rejected in the future due to the possible existence of further
> > control mechanisms such as human intervention. This is not the same as
> > responding with a 2xx in cases where the server fully and clearly rejects
> > the request, and nobody in his right mind will design a purely HTTP/1.1
> > server to behave in this way.
> 
> Just to set some groundwork here, if a WebDAV server executes a DELETE on a
> collection, and the delete is completely successful, then the response code
> should be a 204 (No Content), although 202 (Accepted) and 207 (Multi-Status)
> are also acceptable.  I think we agree there are no interoperability
> problems in this case.

In principle, there shouldn't be interoperability problems, but in practice,
a poorly designed HTTP/1.1 client may very well encounter problems if it gets
a 207.

> On the other side, if a WebDAV server executes a DELETE on a collection, and
> the delete completely fails, then the response code should be a 404 (Not
> Found) or a 403 (Forbidden), depending on what caused the problem (404 -
> nothing was there to delete, while the 403 could handle access control
> problems).  For a complete failure, the 207 (Multi-Status) should not be
> returned (although I will admit that, upon reading RFC 2518, this latter
> point is probably not clear, and should be clarified in future drafts).

I agree with the 404 in the event that the destination does not exist.
However, RFC 2518 clearly states: "If an error occurs with a resource other
than the resource identified in the Request-URI then the response MUST be a
207 (Multi-Status)." Now clearly, if there are any member resources (let alone
all of them) that could not be deleted, then an error occurred with a "resource
other than the resource identified in the Request-URI", and so I should return
a 207. The only case I can return a 403 is if everything was deleted except
for "the resource identified in the Request-URI." Now I don't see how you can
say that something else is supposedly written here, and particularly,
that the spec provides for returning a 403 in case that there where any
internal members that where not deleted. 

> Assuming a DAV server does not return a 207 for this case, then complete
> failure should also not generate any interoperability problems with HTTP/1.1
> delete.
> 
> So, any potential problems would be with a partial completion of the delete
> operation.  

Does the 'rm' command on any UNIX system has a switch that would enable
it to run in a mode where it would issue a warning if and only if *all*
target files failed to get deleted, but would remain totally silent if
*some* failed to get deleted? How about its DOS counterpart? Or have you
ever seen a single file manager that would even have an option to respond
to directory deletion in this way? Or in fact a file manager that would
not give an error message in case that the deletion of even a single file
in a directory failed? Or maybe it is valid for a web server to respond to
a PUT request with a 204 in case that it only got %60 of the
'Content-length' of the file? (It mostly succeeded, didn't it?)
If some of the request failed, then it is an error that should be noted
as such. It is a fundamental principle of virtually any program in
existence that provides similar functionality. Accordingly, getting
appropriately corresponding status codes is also a design assumption of
any existing HTTP/1.1 application. The philosophical question of whether
"partial success" is "success" or "failure" is irrelevant. The fact is
that your decision to use a single multistatus code was made under the
explicit assumption that this code would not be sent to clients that are
not supposed to understand it, and the fact that your protocol now
specifies otherwise, means that it has an interoperability problem with
HTTP/1.1 applications.

> For this case, let me note:
> 1) The behavior of a DELETE on a collection in HTTP/1.1 is problematic for
> file-based servers.  If a DELETE is issued to a resource which has a URL
> which ends in a slash "/" (e.g., "testdir/"), and there are other resources
> which have URLs which add a path onto this slash (e.g, "testdir/one.html",
> "testdir/two.html"), HTTP/1.1 doesn't give any guidance as to what should be
> done with the resources which have URLs which come after the slash.  It
> seems to me that, for the same reason that filesystem-based servers create
> intermediate paths, these same servers would want to delete the resources
> which have "slash plus path" URLs. This leaves servers with the choice of
> either a) deleting the collection, plus the "slash plus path" resources, b)
> doing nothing (reporting an error), or c) internally marking the collection
> as removed, and not affecting the "slash plus path" resources.  My
> interpretation of the HTTP spec. is that either (b) or (c) is what was
> intended by the spec., but I wouldn't be surprised if a filesystem-based
> server has implemented (a).

As I explained in the long discussion above, it is outside the scope of
HTTP/1.1 to specify any given behavior here. Saying that "The behavior of
a DELETE on a collection in HTTP/1.1 is problematic" is a misunderstanding
of HTTP and how this particular HTTP method is used (and should be used)
in existing (or future) HTTP applications. I know implementations that
do either (a) or (b) (and (a) seems to be more popular). I never saw (c).
But anything that would make sense in the context of any given implementation
is legitimate. Obviously, a user (or client) would need to know in advance
how a particular server implements this method in order to make safe and
effective use of it. This basic freedom does not contradict the possible
existence of common practices (or even written standards) that would limit
the actual types of implementations that one encounters to some finite set of
options (or even a single common option), but it is important to understand
that this would be inherently outside the scope of HTTP itself.

> For reference, mod_put's implementation of DELETE does (b), since it will
> attempt to perform an unlink() on the directory, which will fail, causing it
> to report a 403 (Forbidden). It's hard to tell whether this is intentional,
> or if the implementor didn't consider that a directory could be removed.  At
> the very least it is suggestive that HTTP/1.1 delete on a collection is rare
> enough not to be worth implementing.

This is a minimal (maybe just trying to be as safe as possible) implementation
of DELETE. Netscape servers and AOLserver do your (a) (namely, the same behavior
as specified by WebDAV). To the extent that a common practice exists here, I
believe it is your (a).

> 2) Though you've brought it up already, I do think it is worth stating again
> that HTTP/1.1 clients, if designed correctly, should not depend on *any*
> state change occurring on the server as the result of a delete.  As HTTP/1.1
> states, "The client cannot be guaranteed that the operation has been carried
> out, even if the status code returned from the origin server indicates that
> the action has been completed successfully."

This is a patently absurd interpretation of both HTTP/1.1 and the practical
implications of returning faulty status codes. I don't even know what you
mean by "should not depend on *any* state change occurring on the server"
(do you?). Those Clients that I'm familiar with that use DELETE (like
AOLpress and Netscape's Web Publisher) would remain silent if they get
a success code (indicating success to the user in the most commonly used
way), and they will pop up a window with an error message if they get an
error response. They will also adjust their display of the site's "file
system" to reflect the change that they think (based on the server's
response) took place. So they will simply convey to the *user* the wrong
information regarding the outcome of his actions, and they will also keep
displaying a faulty map of existing (or available) resources. Having a
server that returns bogus codes, is exactly the same as having your
operating system returning bogus codes for system calls like unlink().
Your file manager (or rm command) would indicate to you that the command
was successful in cases where it was not. Do you "depend" in any way on
getting the correct indication regarding the success of such commands?
The theoretical possibility that on some HTTP/1.1 compliant servers
success need not equal confirmed deletion of the resource is about as
relevant to this point as to ask how final an "unlink()" really is.
Depending on your OS and file system it may have different meanings in
different situations, and it is usually *not* a true deletion of the
resource. But, regardless of that, there is a clear notion of when it
succeeds or fails, and if you return the wrong code, the program conveys
*false information* to the user. Besides, it will be particularly
irrelevant on WebDAV servers since they are not allowed to have
mechanisms to delay or override deletion. A WebDAV server knows at the
time of responding the exact status of things, and it will be informing
HTTP/1.1 clients (which in practical terms means the users of these
clients) that their request succeeded in cases where it failed.

> So, since an HTTP/1.1 client cannot depend on the response code to a DELETE,
> and since the existing definition of DELETE is ambiguous for collections,
> and since existing implementation practise suggests that delete on a
> collection is an infrequent (perhaps even never executed) operation for
> HTTP/1.1 clients:

You are so very very wrong. AOLpress might not be a purely HTTP/1.1
application, but it uses HTTP/1.1 semantics of DELETE. Now if AOL would
want to add WebDAV support to their servers (like to their PrimeHost
hosting service) they would clearly be facing a conflict. Exactly the
same thing holds for Netscape`s Enterprise server. It provides WebDAV-like
functionality through implementing a whole zoo of its own HTTP methods,
and is so designed to provide authoring capabilities using it's own Web
Publisher client. It so happens that even though it has all those other
methods, both file and directory removal on this server depend on the
HTTP/1.1 DELETE method. Of course, the fact that there are only about
300,000 of these servers on the net does not indicate much. Or does it?
To some this up in a somewhat less sarcastic tone: While I might not know
the extent to which PUT-based creation of directories is used, it is largely
because the main commercial HTTP authoring servers have their own MKDIR
command that is equivalent to the WebDAV MKCOL. But, it so happens, that
they do use DELETE to delete directory trees, and your convenient *decision*
that HTTP/1.1 DELETE isn't being used is about as wrong as it can get.

> a) this problem does not warrant a re-issue of RFC 2518
> b) it is not clear that this problem warrants any changes to the
> specification at all, since at worst it would cause user confusion for an
> error case on an infrequently (if ever) used option of an infrequently
> executed method.

Most of your assertions in (b) above are simply plain false.

> While this might work, it's a bad design to use the Depth header to signal
> this information.

Bad design? Maybe. But the design you have without it is flawed at the core.

As I tried to explain in a previous posting, it isn't even within the
legitimate scope of WebDAV to redefine the semantics of HTTP/1.1 methods,
and there is clearly no real technical need for it to do it either. You
took to yourself a liberty that was never yours. This would have been a
core design flaw even if it didn't generate the slightest real-world
interoperability problem, because those HTTP/1.1 methods have flexibility
that WebDAV does not provide, and thus have the potential of being
used in future applications in ways that you are not likely to be able
to consider right now. It is really very hard to predict the full
consequences of something like this. That's why you shouldn't be doing
it to start with.

As a side remark, you should note that WebDAV would not have suffered one
bit from having two notions of PUT/DELETE within its context (which could
be distinguished either by a header or by different method names). The
WebDAV methods can be defined as you find fit, while the existing HTTP/1.1
methods maintain their flexibility to be used as people find it appropriate
for their applications.

Now whether you like it or not, you created a situation where your
protocol is in conflict with HTTP/1.1. While some aspects of this
conflict are of yet unknown and hard to determine magnitude, there
are other aspects of it that are clearly significant. The only way
to comply with both HTTP/1.1 and WebDAV is to forbid DELETE altogether.
There is no way of building a fully functional fully compliant WebDAV
server that is also fully compliant with HTTP/1.1, and since HTTP/1.1
compliance is required by WebDAV itself, it is even a self-conflict.
The DELETE method, within its HTTP/1.1 semantics, is currently used
on hundreds of thousands of HTTP authoring servers, and your protocol
specifies a behavior that breaks the proper functionality of virtually
any existing client that uses this method. If that isn't an
interoperability problem of your protocol, then I don't know what is.

Now the implications of this basic protocol error on real-world application
design are at least the following (I only relate to the DELETE issue here,
and I further *assume* that there is really some clear notion of WebDAV
semantics for DELETE, even though you say that it should behave *differently*
from what RFC 2518 specifies):
1) Server implementors would need to choose between:
 a) Keep their HTTP/1.1 semantics for DELETE.
 b) Implement the WebDAV semantics for DELETE.
 c) Implement both behaviors along with some (ugly and inherently unreliable)
   mechanism for distinguishing different clients (such as using a database
   of known clients).
 For those implementors that would want to add WebDAV support to a current
 product framework that already relies on HTTP/1.1 DELETE (like AOL/Netscape),
 (b) isn't even a viable option, and they would need to choose between (a)
 and (c).
2) Client implementors that would like to maximize the interoperability of
  their program MUST NOT assume any specific DELETE semantics, and MUST be
  able to deal with both possible behaviors.

Is it the end of the world? No, because implementing those workarounds isn't
that much of an issue and they should work quite well in most cases. So it
would only cost some extra effort/money and somewhat decrease overall
reliability, interoperability, and performance. The biggest problem would
be for naive implementors that would not realize this state of affairs from
the beginning and would need to discover it in the hard way. But, from the
point of view of proper protocol design, creating this situation is a clear
failure.

So I'm really very sorry this is the case, and I tried to explain it as
well as I could, and I'm sorry if I offended anyone in the process, and
I'm tired myself of the whole thing, but I think that your protocol is
broken and that you should fix it. It is your protocol and you can
obviously do what you want, but trying to dismiss a significant design
error by using faulty arguments will not solve the problem.


Yoram
Received on Friday, 23 April 1999 23:09:22 UTC