- From: Geoffrey M. Clemm <geoffrey.clemm@rational.com>
- Date: Tue, 7 Dec 1999 17:26:16 -0500
- To: w3c-dist-auth@w3.org
From: "Eric Sedlar" <esedlar@us.oracle.com> I've implemented this type of system before, and the costs of mark and sweep are enormous when you are talking about large sets of documents living on disk (like in a relational database). People generally expect unlink/delete on a file to work quickly, and applications depend on that not eating up a tremendous amount of cycles or doing a lot of I/O. Mark and sweep is also much more expensive than forbidding the insertion of cycles. This is a classic garbage collection tradeoff. On one side, you can let the client define the relationships that it wants, and force the server to deal with issues like detecting islands of garbage. On the other side, you can require that the client only define relationships that are easy/efficient for the server to garbage collect. The problem with the latter approach is that there are different equally valid choices for how to constrain the relationships (the weak/strong dichotomy you suggest is certainly a very reasonable one). Reaching agreement on the constraints is very hard, since if your system was not implemented with those constraints in mind, they do you no good (and in fact, are probably even harder to implement than the general form). For example, the Unix file system also has weak and strong bindings (symbolic and hard links). It provides for efficient deletion, but it made some different choices from the ones you picked. In particular, it chose to leave the weak bindings (symbolic links) dangling when the last strong binding was deleted, and to only have the strong bindings (hard links) track resource movement. Trying to simulate in the Unix file system the weak/strong behavior you describe would be prohibitively expensive. So although it may be painful to only support one type of bindings, I believe it has the advantage that it is simpler to define and "fairer" across the various existing implementations. If you wait to do garbage collection asynchronously, searches that do not traverse the hierarchical namespace but that use other indexes (like keyword searches, for example) will find things that the user has deleted. WebDAV only supports searches that traverse the hierarchical namespace, so although I understand your point, it does not apply to WebDAV as it is currently defined. Once the "V" (versioning) is added to WebDAV, queries across the entire resource space will probably be rare in any case. Also, deferring the garbage collection associated with the unlink breaks the transactional guarantee when people COMMIT their work. There is no COMMIT functionality in WebDAV, so this would not be an issue. I can give you a perfectly good set of answers about how to deal with weak links. Oracle built a system that implements weak links and forbids cycles as part of the Internet Filesystem product. My responses to your issues are included below, marked with <es> </es>. The set of answers you provide for the Internet Filesystem product appear both consistent and very sensible, especially in the context of an underlying relational database implementation for the WebDAV store. Just one comment on the locking issue: <es> locking a URL locks the namespace, not the resource. I assume this only applies to weak bindings (i.e. there must be some way to lock the resource itself, so I assume that is done through a strong binding?). While you have locked the URL containing a weak binding, nobody else can use that set of names in their respective collections. This sounds very different from WebDAV locks, which prevents unauthorized clients from modifying a resource, not from using it. Or by "use", did you mean "modify"? By "set of names", did you mean for example the set of names: {"/a/b/c", "/a/b", "/a"} that are relevant for a binding named "/a/b/c"? The resource the locked URL points to may actually be deleted while the URL to it is locked. In this case the resource is deleted when the lock is released. (Think of the lock like an open file descriptor, or in database terms, a snapshot in a read-only transaction that was started when the lock was acquired). This has the same effect as if someone deleted that resource the instant the lock was released. </es> This could be hard for a server implement if it didn't have underlying transaction support. Cheers, Geoff
Received on Tuesday, 7 December 1999 17:26:18 UTC