Re: BINDing using a weak reference from Geoffrey M. Clemm on 1999-12-07 (w3c-dist-auth@w3.org from October to December 1999)

From: Geoffrey M. Clemm <geoffrey.clemm@rational.com>
Date: Tue, 7 Dec 1999 17:26:16 -0500
To: w3c-dist-auth@w3.org
Message-Id: <9912072226.AA01406@tantalum>
   From: "Eric Sedlar" <esedlar@us.oracle.com>

   I've implemented this type of system before, and the costs of mark and sweep
   are enormous when you are talking about large sets of documents living on disk
   (like in a relational database).  People generally expect unlink/delete on a
   file to work quickly, and applications depend on that not eating up a
   tremendous amount of cycles or doing a lot of I/O.  Mark and sweep is also much
   more expensive than forbidding the insertion of cycles.

This is a classic garbage collection tradeoff.  On one side, you can
let the client define the relationships that it wants, and force the
server to deal with issues like detecting islands of garbage.  On the
other side, you can require that the client only define relationships
that are easy/efficient for the server to garbage collect.

The problem with the latter approach is that there are different
equally valid choices for how to constrain the relationships (the
weak/strong dichotomy you suggest is certainly a very reasonable one).
Reaching agreement on the constraints is very hard, since if your system
was not implemented with those constraints in mind, they do you no good
(and in fact, are probably even harder to implement than the general
form).

For example, the Unix file system also has weak and strong bindings
(symbolic and hard links).  It provides for efficient deletion, but it
made some different choices from the ones you picked.  In particular,
it chose to leave the weak bindings (symbolic links) dangling when the
last strong binding was deleted, and to only have the strong bindings
(hard links) track resource movement.

Trying to simulate in the Unix file system the weak/strong behavior
you describe would be prohibitively expensive.  So although it may be
painful to only support one type of bindings, I believe it has the
advantage that it is simpler to define and "fairer" across the various
existing implementations.

   If you wait to do garbage collection asynchronously, searches that do not
   traverse the hierarchical namespace but that use other indexes (like keyword
   searches, for example) will find things that the user has deleted.

WebDAV only supports searches that traverse the hierarchical namespace,
so although I understand your point, it does not apply to WebDAV as it
is currently defined.  Once the "V" (versioning) is added to WebDAV,
queries across the entire resource space will probably be rare in any case.

   Also,
   deferring the garbage collection associated with the unlink breaks the
   transactional guarantee when people COMMIT their work.

There is no COMMIT functionality in WebDAV, so this would not be an issue.

   I can give you a perfectly good set of answers about how to deal with weak
   links.  Oracle built a system that implements weak links and forbids cycles as
   part of the Internet Filesystem product.  My responses to your issues are
   included below, marked with <es> </es>.

The set of answers you provide for the Internet Filesystem product
appear both consistent and very sensible, especially in the context of
an underlying relational database implementation for the WebDAV store.

Just one comment on the locking issue:

   <es> locking a URL locks the namespace, not the resource.

I assume this only applies to weak bindings (i.e. there must be some
way to lock the resource itself, so I assume that is done through a
strong binding?).

   While you have
   locked the URL containing a weak binding, nobody else can use that set of names
   in their respective collections.

This sounds very different from WebDAV locks, which prevents
unauthorized clients from modifying a resource, not from using it.
Or by "use", did you mean "modify"?

By "set of names", did you mean for example the set of names:
{"/a/b/c", "/a/b", "/a"} that are relevant for a binding named "/a/b/c"?

   The resource the locked URL points to may actually be deleted while
   the URL to it is locked.   In this case the resource
   is deleted when the lock is released.  (Think of the lock like an open file
   descriptor, or in database terms, a snapshot in a read-only transaction that
   was started when the lock was acquired).  This has the same effect as if
   someone deleted that resource the instant the lock was released.
   </es>

This could be hard for a server implement if it didn't have underlying
transaction support.

Cheers,
Geoff
Received on Tuesday, 7 December 1999 17:26:18 UTC