- From: Yaron Goland <yarong@microsoft.com>
- Date: Sun, 27 Dec 1998 14:27:01 -0800
- To: WEBDAV WG <w3c-dist-auth@w3.org>
Collections, Resourcetype and Hierarchy in WebDAV An IAQ (Infrequently Asked Questions) 1 Why Did WebDAV Decide that the HTTP URL Namespace is a Hierarchy? 2 Why Did WebDAV Create Resource Types like the Collection Resource? 3 Why Did WebDAV Create the Resourcetype Property? 4 Why Did WebDAV Create MKCOL? 5 Why Did WebDAV Allow for Mixed WebDAV and Non-WebDAV Compliant Namespaces? 6 Why Does WebDAV Allow for non-WebDAV Compliant Collections? 7 Why Does WebDAV Require Hierarchy in WebDAV Only Namespaces? 1 Why Did WebDAV Decide that the HTTP URL Namespace is a Hierarchy? A typical HTTP URL looks like http://server.com/name1/name2/name3. The HTTP/1.1 specification never defined what the "/" really meant. Did the "/"s have any meaning or were they just decoration to help people remember where they put their resources? This was one of the very first problems the WebDAV Working Group (WG) had to face. Most of the WG had the very definite idea that WebDAV should provide at least file system level functionality. Even though most of the WG were document management types who didn't really use file systems, they understood that file systems were the single most common form of storage on the planet. Matching file system functionality meant providing at least the possibility of supporting a hierarchical namespace. So the WG decided that the "/"s could represent a hierarchical namespace and that it was WebDAV's job to provide the tools to create and maintain that hierarchy if the client/server choose to make it hierarchical. 2 Why Did WebDAV Create Resource Types like the Collection Resource? File systems contain two types of resources, files and directories. The PUT/GET methods already provided sufficient support to simulate a file. So this left the WebDAV WG with the job of creating directories. The term directory was hopelessly overloaded so an alternative word, collection, was selected in its stead. But what was a collection? The basic object in HTTP is the resource. HTTP does not give a very tight definition of what a resource is. Essentially all HTTP implies is that a resource is an object that is addressed by one or more URLs and that accepts methods. This lack of definition was actually a good thing. It allowed HTTP resources to be extremely flexible. This was one of the important lessons from HTTP, don't define anything you don't absolutely have to define. That way you don't paint yourself into a corner. The key is to only define the parts that are needed for interoperability. Keeping this in mind, the WG realized that for the sake of interoperability it needed to "remove" a little of HTTP's magical vagueness. Specifically, it needed to create a "type" of resource. The idea behind typing was to create a profile for an HTTP resource. The profile specified which methods a resource of that type had to support, how they had to support them and which methods the resource wasn't allowed to support. This was all new ground. 3 Why Did WebDAV Create the Resourcetype Property? Now that we had created a new resource type, we needed to create a way for clients to determine the type of resource they were talking to. Unfortunately the WG had not yet learned the folly of live properties (http://lists.w3.org/Archives/Public/w3c-dist-auth/1998OctDec/0302.html), so the obvious solution was to create a new property. The idea of creating a collection property was kicked around but eventually rejected because the WG realized that where there was one new resource type, there would be many more. Thus the resourcetype property was created. It would be the central repository for declarative information regarding the nature/type/profile of the resource. 4 Why Did WebDAV Create MKCOL? The WG faced the problem of how one created a collection resource. The only way HTTP provides to create a new resource is the PUT method. So the idea was tossed around that we should add a header to the PUT method which specified the type of resource to be created. However there was strong objection to adding this functionality to PUT. The core of the argument was that PUT was one of the very few extremely well defined methods. It was used to record a byte stream that could later be retrieved with a GET. It was considered unwise to add any "magic" to PUT. In this case by allowing PUT to do anything other than to blindly record a byte stream. So a new method was needed. The choice was either to create a generic method to create any resource type or to create a method specifically for creating collections. The question ended up being phrased as "Was PUT a mistake?" Should the HTTP WG have created a generic method to create new resources? Had the HTTP WG simply screwed up? The conclusion of the WebDAV WG was that the HTTP WG had not made a mistake. By creating a PUT method it was possible to carefully define exactly what it meant to record a byte stream. That a resource could be created as a side effect seemed reasonable enough. But still, wouldn't the world be a better place if we created a generic method to create any resource type? This began a long and fairly boring debate about what the body of this magic method should be. The center of the issue was, should the magic method be allowed to create the initial value for the resource? In the case of a collection, this meant initially populating the members of the collection as well as specifying the values of those members. This meant gluing together a bunch of HTTP methods and shoving them inside this new method. This debate had a lot to do with the same issues touched upon in http://lists.w3.org/Archives/Public/w3c-dist-auth/1998OctDec/0303.html. The WG recognized an impossible rat hole when it saw one. As such the WG decided it wouldn't try to create the universal "make any resource type" method and instead would take inspiration from PUT and design a method specifically for creating a collection. More than this, the WG also agreed that it would not define a body for this method. One could be added later but one would not be specified in the WebDAV specification. Thus, using base DAV, the only way to create a collection is with MKCOL and the only way to populate it is with PUTs and more MKCOLs. 5 Why Did WebDAV Allow for Mixed WebDAV and Non-WebDAV Compliant Namespaces? The WG realized that a mechanism was needed to determine if a server was WebDAV compliant or not. After all, since WebDAV was using HTTP on port 80 how could you tell a WebDAV compliant HTTP server from a normal HTTP server? The obvious solution was to use the OPTIONS method. This HTTP/1.1 method was specifically designed to provide protocol information. At first the WG thought it could require performing an OPTIONS method on the special request-URI "*". This request-URI had been introduced by the HTTP WG as a means of discovering information about an entire server. The server community reacted very badly to this proposal. "*" did not just cover an entire server, it covered an entire HTTP namespace. When a "*" request-URI is sent, scoping is only provided by the host header. This means that "*" applies to all resources in that domain. In other words, it meant that every resource in http://uci.edu would have to be WebDAV compliant if any wanted to support WebDAV. The reason the HTTP WG originally introduced "*" is that in the old days it was common for a single server to handle all the URIs in an entire domain. However a number of developments invalidated the assumption underlying "*". The first was the introduction of server extension mechanism such as CGI/ISAPI/NSAPI/Modules. These mechanisms allowed one to add a program to a server so that the program controlled a part of that server's namespace. Thus one could take an existing server and by adding an extension that supported WebDAV, make part of the server's namespace WebDAV compliant. If discovery could only be performed on "*" then one could only make a server WebDAV compliant by making the entire server compliant. This loss of flexibility was considered unacceptable. Second, different servers often controlled parts of a HTTP URL namespace. A redirector would be used to route requests based on the URL. Thus it was very likely that one server may want to support WebDAV, thus WebDAV enabling part of the HTTP URL namespace, but the rest wouldn't be interested. So the WG came to the conclusion that it couldn't use "*". This meant that discovery would have to be determined by executing the OPTIONS method on each resource individually. At this point the WebDAV client community complained. Imagine someone hands a WebDAV client the URL http://foo/bar/blah. The WebDAV client then performs an OPTIONS request on http://foo/bar/blah and discovers that the resource is WebDAV compliant. Now the WebDAV client wants to display a picture of the resource's namespace to the user. Something like: foo |-bar |-blah But how does the client know that http://foo and http://foo/bar are WebDAV compliant? If they aren't and the user tries to click on them, who knows what would happen. This was known as the mixed namespace problem. WebDAV compliant and non-compliant resources could end up in the same HTTP URL namespace. Many in the client community were very concerned that they would be required to perform discovery on every resource they worked with. Not an appetizing thought. The first suggest was to ban mixed namespaces. However this suggestion was quickly rejected for the reasons given previously. A second suggestion was to allow the root of a WebDAV namespace to be anywhere in the HTTP namespace but to require that all the children of that root be WebDAV compliant. Thus http://uci.edu/ may not be WebDAV compliant but http://uci.edu/users/jwhitehead could be WebDAV compliant so long as all its children were compliant. However this suggestion was rejected for the same reason that the suggestion requiring the entire HTTP namespace to be WebDAV compliant was rejected. Imagine the HTTP namespace rooted at http://foo/ was owned by server A but the namespace rooted at http://foo/bar/blah was owned by server B. Now imagine that server A wants to be WebDAV complaint but server B does not. The second suggestion would mean that both http://foo/ and http://foo/bar/ could never be WebDAV compliant because they were parents of http://foo/bar/blah, which is not WebDAV compliant. There were then suggestions that the WebDAV WG provide a mechanism to map a namespace. The idea was that the client could make a single request to the server and not only find out if a particular URL was WebDAV compliant, but what other resources on the server were compliant. The server community quickly rejected the idea as both too expensive and not implementable. If the namespace is cut up between different machines there may not be anyway for the different machines to discover each other much less figure out which resources exist and if they support WebDAV. Eventually the client community accepted that mixed namespaces where here to stay and that clients were going to have to pay the cost for resource by resource detection. The over all cost proved not to be too bad because of the hierarchy manipulation mechanisms WebDAV provided. 6 Why Does WebDAV allow for non-WebDAV Compliant Collections? At one point or another the authors of the WebDAV spec realized that a resource could meet all of WebDAV's requirements for a collection resource without necessarily supporting all the WebDAV methods. Recognizing that some people might want to be able to just implement collections without necessarily supporting all of WebDAV the authors decided to throw in language that allowed a resource to be a collection without necessarily being WebDAV compliant. It was one of those "never forbid without a damn good reason" type decisions. 7 Why Does WebDAV Require Hierarchy in WebDAV Only Namespaces? Section 5.2 of the WebDAV standard states that: For all WebDAV compliant resources A and B, identified by URIs U and V, for which U is immediately relative to V, B MUST be a collection that has U as an internal member URI. This requirement is even stronger than consistency. Consistency only requires that if http://a/b/c exists then http://a/b/ exist. Section 5.2 requires hierarchy. Not only must http://a/b/ exist if http://a/b/c exists but http://a/b/ must be a collection and must contain http://a/b/c as a member. This means that if http://a/ and http://a/b/c are both WebDAV compliant but can't communicate with each other, for example http://a/b/c is a virtual root or on a different server, then there must be a non-WebDAV buffer between them. Because http://a/ has no way to communicate depth requests to http://a/b/c or to even be sure that http://a/b/c currently exists the non-WebDAV buffer prevents the resources from being required to communicate with each other. In addition to section 5.2 there are requirements throughout WebDAV which prevent most of its methods from creating inconsistent namespaces as a result of their execution. When these consistency requirements are combined with section 5.2 the result is that these methods are essentially required to never create a non-hierarchical WebDAV namespace. On the face of it these requirements looks a bit unwieldy and quite possibly, unnecessary. Below I present the arguments that lead to these requirements. However the presentation is a complete fiction. Most of these arguments were never explicitly made. The decision to include these requirements was made over a period of years as part of arguments buried deep in different contexts. What I have tried to do below is to distill those arguments to just the parts relevant to WebDAV's hierarchy requirements. 7.1 Client Hierarchy Requirements When a company sells client software it usually has to give away a certain amount of free support. If a user picks up the phone to use their free support all the profit on that sale is generally negated as a function of the cost of providing help services. Thus it is very galling to client makers that when servers screw up (resource not available, protocol not supported, etc.) it is the client software maker, not the server maker, who get the phone call. This is why one sees weird messages in network enabled client programs like "Operation Successful - Connection Failed". The program is trying to tell the user that the client didn't do anything wrong, it is the network or the server that screwed up so please make them pay for the support call. Thus a large segment of the client community had a very serious concern regarding hierarchy. Specifically, the following scenario: 1. the user saves a file to a server, 2. the user subsequently views the contents of the collection the file was saved in, 3. the file is not listed because the server isn't enforcing a consistent namespace, even among WebDAV compliant resources. 4. the user picks up the phone and calls client support demanding to know why the client lost their file. Of course the previous scenario could still occur even if the namespace was hierarchical. For example, someone may save a file and before they have a chance to list the contents of the parent collection someone else may delete the file. Thus, to the user, it appears as if the file just disappeared. A possible solution to this problem is to require transactioning/versioning to let the client know that the file was saved but was subsequently deleted. However this scenario is sufficiently rare and the costs for dealing with it in the protocol sufficiently high that the client community accepted that it was going to have to suffer the costs for dealing with this scenario itself. The client community wasn't particularly concerned about mixed namespaces. They expected that in the average case there would be a WebDAV root and all the resources underneath it would also be WebDAV compliant. Thus the client community was primarily concerned with optimizing for behavior in WebDAV namespaces, rather than worrying about all the possible combinations in mixed namespaces. Thus the client community was strictly concerned about hierarchy in WebDAV namespaces. Even so, they were not asking that there be a requirement that WebDAV namespaces be hierarchical. Rather they were asking that two features be added to the protocol: 1) A way to guarantee that a method would not result in the creation of a non-hierarchical namespace. 2) A way to determine if an existing namespace is hierarchical. Requirement 1 would prevent the previous scenario. Requirement 2 would keep the client from getting blamed if someone else created the previous scenario. Finally, a large segment of the client community made it absolutely clear that they would refuse to work with any server that did not maintain a hierarchical namespace, at least amongst WebDAV compliant resources. Their reasoning was that their UI was hierarchy based so they had no way to display non-hierarchical namespaces. Furthermore, they had no interest in being able to display non-hierarchical namespaces. They felt they would only confuse their users and thus increase support costs. In addition, their commands (such as copy/move/delete) were all based on hierarchy manipulation and thus required a hierarchical namespace. 7.2 Server Hierarchy Requirements The hierarchical server community uses file systems and other hierarchical stores to record data. This segment of the server community was not very thrilled about non-hierarchical namespaces. They could implement them, but at a high cost. How would a hierarchical store record the fact that http://a/b, http://a/b/c and http://a/b/c/d exist but are all non-collection resources? How would a hierarchical store record that http://e/ and http://e/f/g exists but http://e/f does not? Both problems are solvable with enough record keeping and redirections on the part of the server. An even worse problem, from the hierarchical server community's point of view was that requiring servers to support non-hierarchical namespaces would effectively limit access to the underlying store to only HTTP. What happens if a client using the file system directly tries to access the store? It would be completely confused, not recognizing that a/b was not really meant to be a directory or that e/f/g was not meant to be seen as a child of e/f. The hierarchical server community had a strong requirement that their underlying stores be accessible and understandable through multiple access mechanisms. This meant they needed to be able to maintain their hierarchy. The message from the hierarchical server community was that supporting non-hierarchical namespaces in the WebDAV protocol was fine, just as long as their servers were allowed to require that all resources be WebDAV compliant and hierarchical. 7.3 The Working Group Analysis Hearing from these two groups of implementers the WG was getting worried that a serious interoperability issue was being created. Hierarchical clients refused to work with non-hierarchical servers and hierarchical servers refused to work with clients requiring a non-hierarchical WebDAV namespace. 7.3.1 Sizing up the Market The WG decided to determine just how bad the interoperability issue was likely to be by examining what clients and servers were doing today. The WG's investigations turned up two types of client/servers: Pure Hierarchical - This class accounted for the overwhelming majority of deployed authoring clients and servers. Typical examples include all known file systems as well as the file manipulation dialogs of all major clients (Word Perfect, Word, File Explorer, Finder, etc). Note, this doesn't include linking, which the group had already agreed would be dealt with in a separate draft (which has since been published). Property Based - The majority of the remaining client/servers belong to high-end document and data management systems. These systems rarely presented their users with a hierarchy. Rather they allowed the user to provide meta-data that was then used to perform a search that would then present a result space. For example, a user could ask for a particular configuration or a version. Neither request identifies a resource name but rather identifies values of properties associated with resources. The system would search on those values and present the result. The underlying stores were often flat namespaces. Those that weren't flat were unvaryingly hierarchical. Interestingly enough the WG never turned up an authoring client or server that required a non-hierarchical namespace only those that could operate in the absence of hierarchy because they used a flat namespace. Thus the WG came to the conclusion that the majority of WebDAV clients would be unwilling to even try to talk to server that supported non-hierarchical WebDAV namespaces. The WG also concluded the majority of WebDAV servers would demand that the namespace was always hierarchical. Thus an interoperability issue did exist. Clients which, for whatever reason, required the ability to create non-hierarchical namespaces would have almost no one to talk to and servers that didn't allow clients to require a hierarchical WebDAV namespace would find very few clients willing to talk to them. 7.3.2 The Working Group's Conclusion The WG had two solutions: The first solution was to allow non-hierarchical WebDAV namespaces and augment the protocol to require that clients specify if a method is allowed to create a non-hierarchical namespace. The protocol would also have to be augmented to indicate to the client if the existing WebDAV namespace enforced hierarchy. The upside of this solution was that it allowed for flexibility. The downside of this proposal is that it complicated the protocol by adding additional headers and properties to allow for discovery and enforcement of hierarchy. The second solution was to mandate that WebDAV namespaces must be hierarchical. The upside of this solution was that it was extremely simple, requiring no protocol changes. The downside to this proposal is that it shut the door on supporting non-hierarchical WebDAV namespaces. What finally swayed matters was the realization that allowing non-hierarchical WebDAV namespaces essentially meant creating two separate protocols. HWebDAV and NHWebDAV. Hierarchical client/servers used HWebDAV and non-hierarchical client/servers use NHWebDAV. Thus the cause of interoperability was not served by adding support for non-hierarchical WebDAV namespaces but the protocol was certainly made more complex by adding support for non-hierarchical WebDAV namespaces. So the WG decided to mandate that WebDAV namespaces had to be hierarchical.
Received on Sunday, 27 December 1998 17:27:05 UTC