Date: Wed, 20 Jan 1999 01:22:27 -0500 Message-Id: <9901200622.AA14143@tantalum> From: "Geoffrey M. Clemm" <gclemm@tantalum.atria.com> To: yarong@microsoft.com Cc: ietf-dav-versioning@w3.org In-Reply-To: <3FF8121C9B6DD111812100805F31FC0D08792D4D@RED-MSG-59> (message Subject: Re: CHECKIN/CHECKOUT - URNs and Destroying Immutable Resources From: Yaron Goland <yarong@microsoft.com> Then again, speaking from long painful experience, the WebDAV WG's normal review process involves taking the authors out back and beating them senseless. Better some bruises than a bad spec ... so bring on the beatings! (:-) And to emphasize a note in my original posting, this represents a proposal to the design team, *NOT* a consensus document from the design team! > Putting a Resource under Version Control > > When a resource is put under version control, it becomes unwriteable. > In order to modify a resource, it must first be checked out, then can > be modified one or more times, and then checked back in to indicate > you are done modifying it. If your CHECKOUT fails, it means someone > else is currently modifying the document, so you should only do a GET > with the understanding that the results are only temporarily valid. > 1) First you say that a resource under version control is unwriteable and then you explain how to modify a resource. I'm confused. I suspect you need to discuss your model a bit. One can infer a lot about the model by reading the rest of the paper but I dislike having to infer, because I tend to infer incorrectly. I have tried to apply the terms "readonly/unwriteable" and "writeable" consistently to refer to a temporary state of a resource, and use the term "mutable" and "immutable" to refer to a permanent state of a resource. So when you put a resource under version control, it becomes unwriteable, but you can apply a method to it (CHECKOUT) to make it writeable. In contrast, a versioned resource is mutable. In this proposal, it is only certain kinds of revisions that are immutable (namely, immutable-revisions). 2) CHECKOUTs can fail for many reasons wholly unrelated to current use. But the statement does lead one to infer that the proposed versioning system can not support multiple simultaneous checkouts. Is this true? Only one checkout of a given mutable-revision (branch) of a versioned resource, yes. If you want to check-out an already checked-out mutable- revision (branch), you need to use the CHECKOUT-NEW method, which checks out a new mutable-revision (branch) of the versioned resource. Note: some systems (like ClearCase) support an "unreserved checkout". If we want to support this concept (and I'm not advocating that we do), then we would just adjust the definition to say that there can be at most one "reserved" checkout of a mutable-revision (branch). > Checkout vs. Lock > > Note the distinction between a (write) LOCK and CHECKOUT. The LOCK > takes a resource that is writeable by everyone and temporarily makes > it unwriteable by everyone except the lock holder (until it is > UNLOCK'ed). A CHECKOUT takes a resource that is unwriteable by > everyone, and temporarily makes it writeable (until it is CHECKIN'd). > It is reasonable to apply a LOCK to a checked-out resource, but is > not required. In particular, many systems will decide the LOCK > is irrelevant, since a "friendly" client will delay writing until > it can perform a CHECKOUT, and an "unfriendly" client can just wait > until the UNLOCK and then trash the resource contents at will. > The distinction between shared and exclusive locks should be pointed out. I don't believe this distinction is relevant here. The point is that in any model which assumes a LOCK/UNLOCK paradigm (whether it is a shared or an exclusive lock), an unfriendly client can trash the resource as soon as it is unlocked. If on the other hand, you support just a LOCK paradigm (where you do not assume the resource is unlocked when you are done with it), then you are "limiting the scope of damage" (which is a good thing), but that is very different from synchronizing updates between multiple (possibly concurrent) authors, which is what CHECKOUT/CHECKIN is all about. > Immutable-Revisions > > An immutable-revision is a revision whose contents (and immutable > properties) cannot be changed. More precisely, an attempt to retrieve > the contents or immutable properties of an immutable-revision will > always return the same contents or will fail. Therefore a server can > delete the contents or properties of an immutable-revision (resulting > in a failure when an attempt is made to retrieve those contents or > properties), but can never delete the immutable-revision itself. > If I understand your meaning in saying "never delete the immutable-revision itself" you are implying that a server could nuke all the state associated with the immutable-revision but not a note specifying that once upon a time such a revision did exist and did hold a certain position in the version tree. Yes. However the reality is that people will want to destroy even notices of the existence of a revision for any number of reasons, some more nefarious than others. I suspect it is unrealistic of us to expect the protocol to be able to prevent this. What non-nefarious people want to do is to make old revisions no longer visible, and to conserve space. The protocol should/must support both of these operations. Giving a server (with its own unique-id generator) control over some part of the URI space (which is feasible and common in CM systems) takes care of the rest. The protocol can't prevent the server from doing something different, but to the extent that the server deviates from the protocol, it will not act as a client expects/intends. There is the additional problem of what to do if the resource is destroyed and its HTTP URL gets re-used. Who will return the "this resource has been nuked" notice? The way the language is current written it would seem that once you assign an HTTP URL to a version of a resource, even if you destroy the resource, you are still required to reserve the HTTP URL so it can return the "this resource doesn't exist anymore" error. I suspect we will find significant opposition to this idea. People tend to get touchy about their HTTP URL namespaces. People also get touchy if their CM system lies about the previous state of their resources. To support both perspectives, we support both mutable-revisions and immutable-revisions, where the protocol is designed so that these two models can interoperate. So if you don't care about reliable resource history, you can just go with a WebDAV server that only supports mutable-revisions. If you want reliable history, you go with one that also supports immutable-revisions (in this proposal, the immutable revision protocol is a proper super-set of the mutable revision protocol). One alternative is to require that a note be dropped into the version history specifying that there did once exist a version with a set of particular characteristics but its resource has since been destroyed. I don't think this is a good idea because it means that we need to refer to a resource (even one which doesn't currently exist) without the use of a URI. This is likely to muck up the protocol in all sorts of unhappy ways. The server might well internally implement it along these lines, but this implementation choice certainly shouldn't be exposed in the protocol. What we need is a URI which refers to a resource independently of the HTTP URL used to actually retrieve the resource. Other than giving control of part of the URL namespace to the versioning system (which is what every CM system does), why can't it be a URL? Which brings us to URNs. I don't propose we actually use URNs, I don't like them very much. But the underlying concept is sound. We should require that all resources have a URI associated with them that meets the same uniqueness requirements we place on lock token URIs. The URIs DO NOT HAVE TO BE RESOLVABLE. If they are, bonus points, but it is not necessary for the protocol to work properly. Not good enough. I need to be able to find this resource, not just detect whether two different URL's refer to it. It's the resource that tells me the URL's of the other revisions associated with this resource. When a resource is created it must be assigned one of these universally unique URIs. The URI can then be used with the IF header on any requests to an HTTP URL so as to ensure that the request will only succeed if the resource is the same resource as the one specified by the URI. See above. The history graph is then free to refer to both the URI and any known URLs that the resource is available under. If the HTTP URL is changed or the resource is destroyed then the graph will only refer to the universally unique URI. This allows the version to still be refereed to in various operations (such as creating a child) even though it doesn't exist. This doesn't give me the reliable history that I require for effective configuration management. Over the lifetime of a resource, there will be hundreds of revisions of the resource, most of which are no longer "active". But periodically, I need to refer to some of them as full-fledged resources, with their properties and contents accessible and intact. Having them stamped with some URI that gives me no ability to locate them is of little, if any, value. There's a good chance that email doesn't provide the bandwidth we need here, so voice or even visual communication may be required. The survivor (if any) then could report back to the mailing list (:-). Cheers, Geoff