Re: CHECKIN/CHECKOUT - URNs and Destroying Immutable Resources

Geoffrey M. Clemm (gclemm@tantalum.atria.com)
Wed, 20 Jan 1999 01:22:27 -0500


Date: Wed, 20 Jan 1999 01:22:27 -0500
Message-Id: <9901200622.AA14143@tantalum>
From: "Geoffrey M. Clemm" <gclemm@tantalum.atria.com>
To: yarong@microsoft.com
Cc: ietf-dav-versioning@w3.org
In-Reply-To: <3FF8121C9B6DD111812100805F31FC0D08792D4D@RED-MSG-59> (message
Subject: Re: CHECKIN/CHECKOUT - URNs and Destroying Immutable Resources

   From: Yaron Goland <yarong@microsoft.com>

   Then again, speaking from long painful experience, the WebDAV WG's normal
   review process involves taking the authors out back and beating them
   senseless.

Better some bruises than a bad spec ... so bring on the beatings!  (:-)

And to emphasize a note in my original posting, this represents a
proposal to the design team, *NOT* a consensus document from the design team!

   > Putting a Resource under Version Control
   > 
   > When a resource is put under version control, it becomes unwriteable.
   > In order to modify a resource, it must first be checked out, then can
   > be modified one or more times, and then checked back in to indicate
   > you are done modifying it.  If your CHECKOUT fails, it means someone
   > else is currently modifying the document, so you should only do a GET
   > with the understanding that the results are only temporarily valid.
   > 

   1) First you say that a resource under version control is unwriteable and
   then you explain how to modify a resource. I'm confused. I suspect you need
   to discuss your model a bit. One can infer a lot about the model by reading
   the rest of the paper but I dislike having to infer, because I tend to infer
   incorrectly.

I have tried to apply the terms "readonly/unwriteable" and "writeable"
consistently to refer to a temporary state of a resource, and use the
term "mutable" and "immutable" to refer to a permanent state of a
resource.  So when you put a resource under version control, it
becomes unwriteable, but you can apply a method to it (CHECKOUT) to make
it writeable.  In contrast, a versioned resource is mutable.  In this
proposal, it is only certain kinds of revisions that are immutable
(namely, immutable-revisions).

   2) CHECKOUTs can fail for many reasons wholly unrelated to current use. But
   the statement does lead one to infer that the proposed versioning system can
   not support multiple simultaneous checkouts. Is this true?

Only one checkout of a given mutable-revision (branch) of a versioned
resource, yes.  If you want to check-out an already checked-out mutable-
revision (branch), you need to use the CHECKOUT-NEW method, which checks
out a new mutable-revision (branch) of the versioned resource.

Note: some systems (like ClearCase) support an "unreserved checkout".
If we want to support this concept (and I'm not advocating that we do),
then we would just adjust the definition to say that there can be at most
one "reserved" checkout of a mutable-revision (branch).

   > Checkout vs. Lock
   > 
   > Note the distinction between a (write) LOCK and CHECKOUT.  The LOCK
   > takes a resource that is writeable by everyone and temporarily makes
   > it unwriteable by everyone except the lock holder (until it is
   > UNLOCK'ed).  A CHECKOUT takes a resource that is unwriteable by
   > everyone, and temporarily makes it writeable (until it is CHECKIN'd).
   > It is reasonable to apply a LOCK to a checked-out resource, but is
   > not required.  In particular, many systems will decide the LOCK
   > is irrelevant, since a "friendly" client will delay writing until
   > it can perform a CHECKOUT, and an "unfriendly" client can just wait
   > until the UNLOCK and then trash the resource contents at will.
   > 

   The distinction between shared and exclusive locks should be pointed out.

I don't believe this distinction is relevant here.  The point is that in
any model which assumes a LOCK/UNLOCK paradigm (whether it is a shared or
an exclusive lock), an unfriendly client can trash the resource as soon as
it is unlocked.  If on the other hand, you support just a LOCK paradigm
(where you do not assume the resource is unlocked when you are done with it),
then you are "limiting the scope of damage" (which is a good thing), but
that is very different from synchronizing updates between multiple (possibly
concurrent) authors, which is what CHECKOUT/CHECKIN is all about.

   > Immutable-Revisions
   > 
   > An immutable-revision is a revision whose contents (and immutable
   > properties) cannot be changed.  More precisely, an attempt to retrieve
   > the contents or immutable properties of an immutable-revision will
   > always return the same contents or will fail.  Therefore a server can
   > delete the contents or properties of an immutable-revision (resulting
   > in a failure when an attempt is made to retrieve those contents or
   > properties), but can never delete the immutable-revision itself.
   > 

   If I understand your meaning in saying "never delete the immutable-revision
   itself" you are implying that a server could nuke all the state associated
   with the immutable-revision but not a note specifying that once upon a time
   such a revision did exist and did hold a certain position in the version
   tree.

Yes.

   However the reality is that people will want to destroy even notices
   of the existence of a revision for any number of reasons, some more
   nefarious than others. I suspect it is unrealistic of us to expect the
   protocol to be able to prevent this. 

What non-nefarious people want to do is to make old revisions no
longer visible, and to conserve space.  The protocol should/must
support both of these operations.  Giving a server (with its own
unique-id generator) control over some part of the URI space (which is
feasible and common in CM systems) takes care of the rest.  The
protocol can't prevent the server from doing something different,
but to the extent that the server deviates from the protocol, it will
not act as a client expects/intends.

   There is the additional problem of what to do if the resource is destroyed
   and its HTTP URL gets re-used. Who will return the "this resource has been
   nuked" notice? The way the language is current written it would seem that
   once you assign an HTTP URL to a version of a resource, even if you destroy
   the resource, you are still required to reserve the HTTP URL so it can
   return the "this resource doesn't exist anymore" error. I suspect we will
   find significant opposition to this idea. People tend to get touchy about
   their HTTP URL namespaces.

People also get touchy if their CM system lies about the previous state
of their resources.  To support both perspectives, we support both
mutable-revisions and immutable-revisions, where the protocol is designed
so that these two models can interoperate.  So if you don't care about
reliable resource history, you can just go with a WebDAV server that only
supports mutable-revisions.  If you want reliable history, you go with
one that also supports immutable-revisions (in this proposal, the immutable
revision protocol is a proper super-set of the mutable revision protocol).

   One alternative is to require that a note be dropped into the version
   history specifying that there did once exist a version with a set of
   particular characteristics but its resource has since been destroyed. I
   don't think this is a good idea because it means that we need to refer to a
   resource (even one which doesn't currently exist) without the use of a URI.
   This is likely to muck up the protocol in all sorts of unhappy ways.

The server might well internally implement it along these lines, but this
implementation choice certainly shouldn't be exposed in the protocol.

   What we
   need is a URI which refers to a resource independently of the HTTP URL used
   to actually retrieve the resource.

Other than giving control of part of the URL namespace to the versioning system
(which is what every CM system does), why can't it be a URL?

   Which brings us to URNs. I don't propose we actually use URNs, I don't like
   them very much. But the underlying concept is sound. We should require that
   all resources have a URI associated with them that meets the same uniqueness
   requirements we place on lock token URIs. The URIs DO NOT HAVE TO BE
   RESOLVABLE. If they are, bonus points, but it is not necessary for the
   protocol to work properly.

Not good enough.  I need to be able to find this resource, not just detect
whether two different URL's refer to it.  It's the resource that tells me
the URL's of the other revisions associated with this resource.

   When a resource is created it must be assigned one of these universally
   unique URIs. The URI can then be used with the IF header on any requests to
   an HTTP URL so as to ensure that the request will only succeed if the
   resource is the same resource as the one specified by the URI.

See above.

   The history graph is then free to refer to both the URI and any known URLs
   that the resource is available under. If the HTTP URL is changed or the
   resource is destroyed then the graph will only refer to the universally
   unique URI. This allows the version to still be refereed to in various
   operations (such as creating a child) even though it doesn't exist.

This doesn't give me the reliable history that I require for effective
configuration management.  Over the lifetime of a resource, there will
be hundreds of revisions of the resource, most of which are no longer
"active".  But periodically, I need to refer to some of them as
full-fledged resources, with their properties and contents accessible
and intact.  Having them stamped with some URI that gives me no
ability to locate them is of little, if any, value.

There's a good chance that email doesn't provide the bandwidth we need
here, so voice or even visual communication may be required.  The
survivor (if any) then could report back to the mailing list (:-).

Cheers,
Geoff