Re: CHECKIN/CHECKOUT - URNs and Destroying Immutable Resources

David G. Durand (dgd@cs.bu.edu)
Wed, 20 Jan 1999 09:53:28 -0500


Message-Id: <v04011701b2cb967b388b@[24.0.249.126]>
In-Reply-To: <3FF8121C9B6DD111812100805F31FC0D08792D4D@RED-MSG-59>
Date: Wed, 20 Jan 1999 09:53:28 -0500
To: ietf-dav-versioning@w3.org
From: "David G. Durand" <dgd@cs.bu.edu>
Subject: RE: CHECKIN/CHECKOUT - URNs and Destroying Immutable Resources

At 10:46 PM -0500 1/19/99, Yaron Goland wrote:
>Geoffrey Clemm wrote:
>> When a resource is put under version control, it becomes unwriteable.
>> In order to modify a resource, it must first be checked out, then can
>> be modified one or more times, and then checked back in to indicate
>> you are done modifying it.  If your CHECKOUT fails, it means someone
>> else is currently modifying the document, so you should only do a GET
>> with the understanding that the results are only temporarily valid.
>>
>
>1) First you say that a resource under version control is unwriteable and
>then you explain how to modify a resource. I'm confused. I suspect you need
>to discuss your model a bit. One can infer a lot about the model by reading
>the rest of the paper but I dislike having to infer, because I tend to infer
>incorrectly.

There are two kinds of version control: one where a version is immutable,
and one where it isn't. Geoff is proposing that these be unified. There's
also an ambiguity with respect to "writeable": a system that preserves
immutable _revisions_ a resource is still writeable, but only by creating a
new revision, not by modifying an existing revision. For a system that
doesn't preserve immutable revisions, this distinction is masked by the
fact that there's a number of distinct objects (any of which might be
mutable, depending on whether or not they are frozen).

By having the same protocol element be used for named branches and "mutable
versions," we can hide the distinctions between these different kinds of
systems in a way that they can interoperate. Clients that don't care about
immutability will simply never even notice that a server may be maintaining
immutable revisions in the background.

>2) CHECKOUTs can fail for many reasons wholly unrelated to current use. But
>the statement does lead one to infer that the proposed versioning system can
>not support multiple simultaneous checkouts. Is this true?

I hope not.

>> Checkout vs. Lock
>>
>> Note the distinction between a (write) LOCK and CHECKOUT.  The LOCK
>> takes a resource that is writeable by everyone and temporarily makes
>> it unwriteable by everyone except the lock holder (until it is
>> UNLOCK'ed).  A CHECKOUT takes a resource that is unwriteable by
>> everyone, and temporarily makes it writeable (until it is CHECKIN'd).
>> It is reasonable to apply a LOCK to a checked-out resource, but is
>> not required.  In particular, many systems will decide the LOCK
>> is irrelevant, since a "friendly" client will delay writing until
>> it can perform a CHECKOUT, and an "unfriendly" client can just wait
>> until the UNLOCK and then trash the resource contents at will.

>The distinction between shared and exclusive locks should be pointed out.

it seems that Yaron is right, and checkout now implies the taking of a
lock. This seems to kill the notion that we had been calling "auto forking"
(where a server creates a revision tree fork when a conflict occurs, and
never blocks any update attempt). I still think we should separate
declaration of an editing session (checkout) from blocking of competing
accesses (locking). We may need to request a lock in the same message that
we request a checkout (and fail if we can't get it), but that's niot the
same thing.


>I will defer my points regarding mutable resources to another post.
>> Immutable-Revisions

>>snip

>
>If I understand your meaning in saying "never delete the immutable-revision
>itself" you are implying that a server could nuke all the state associated
>with the immutable-revision but not a note specifying that once upon a time
>such a revision did exist and did hold a certain position in the version
>tree. However the reality is that people will want to destroy even notices
>of the existence of a revision for any number of reasons, some more
>nefarious than others. I suspect it is unrealistic of us to expect the
>protocol to be able to prevent this.

This is an issue that was raised in the requirements: configuration
management does not work if revision IDs are ever re-used. _As long as_ the
resource exists, its revision IDs must not be re-used. Hiding the past
existence of a revision is pretty irrelevant: if the server never re-uses
the revision ID, the client really can't know _why_ that is the case. If
the server has a requirement to perform such hiding, then it's free to hand
out opaque version names (like MD5 digests, say) -- this leaves a client no
way to deduce what revisions may or may not have existed in the past, only
what revisions are currently part of the resource's history.

>There is the additional problem of what to do if the resource is destroyed
>and its HTTP URL gets re-used. Who will return the "this resource has been
>nuked" notice? The way the language is current written it would seem that
>once you assign an HTTP URL to a version of a resource, even if you destroy
>the resource, you are still required to reserve the HTTP URL so it can
>return the "this resource doesn't exist anymore" error. I suspect we will
>find significant opposition to this idea. People tend to get touchy about
>their HTTP URL namespaces.

In the requirements, we stated that this is true _only as long as the URL
represents the same resource_. Geoff's proposal meets that requirement.
When a resource is destroyed its revisions _and the constraint_ vanish.

>One alternative is to require that a note be dropped into the version
>history specifying that there did once exist a version with a set of
>particular characteristics but its resource has since been destroyed.

No need.

If the _server_ allows revisions to be removed, then it must keep an
internal note to that effect. The client doesn't need to know about this.
Or are you making a feature request? I thought an earlier point was that
you want to hide the previous existence of a version.

> I
>don't think this is a good idea because it means that we need to refer to a
>resource (even one which doesn't currently exist) without the use of a URI.
>This is likely to muck up the protocol in all sorts of unhappy ways. What we
>need is a URI which refers to a resource independently of the HTTP URL used
>to actually retrieve the resource.

This sounds like the URI for the "versioned resource" as opposed to the URI
for the resource, or the URI for the revisions. There are a lot of
definitions in the versioning goals draft that may make this discussion
clearer. For instance, the word "version" is undefined, as having too many
implications. It gets used as a typo for revision, and sometimes as a way
of indicating a reference to the "informal" concept.

>Which brings us to URNs. I don't propose we actually use URNs, I don't like
>them very much.

I don't think they matter much here, and URIs automatically include them
anyway.

> But the underlying concept is sound.

So what's not to like: URN is a name for an underlying concept that must be
enforced by a policy commitment in system implementation and deployment
supported by administrative commitments to ensure uniqueness.

>We should require that
>all resources have a URI associated with them that meets the same uniqueness
>requirements we place on lock token URIs. The URIs DO NOT HAVE TO BE
>RESOLVABLE. If they are, bonus points, but it is not necessary for the
>protocol to work properly.

The goals document defines terms for a whole host of logical entities. We
all expect that these will all have URIs... You may be complaining about
something that's not yet written up because the protocol isn't mature
enough, and

>When a resource is created it must be assigned one of these universally
>unique URIs. The URI can then be used with the IF header on any requests to
>an HTTP URL so as to ensure that the request will only succeed if the
>resource is the same resource as the one specified by the URI.

This solution imposes a needless burden. It would be very useful to have
universal unique names for all versions of all resources _ever_ -- but it's
not required for many useful systems. Where it is required, everything
proposed so far will work just as well with URNs instead of URLs, except
that the use of URNs would solve the uniqueness problem.

>The history graph is then free to refer to both the URI and any known URLs
>that the resource is available under. If the HTTP URL is changed or the
>resource is destroyed then the graph will only refer to the universally
>unique URI. This allows the version to still be refereed to in various
>operations (such as creating a child) even though it doesn't exist.

This is useful, indeed, but not an issue that we need solve. As long as we
use the term URI, people can use URNs when they require a globally unique
persisten identifier. URN is just the IETF standard way to register and
create such identifiers.

One thing that you should bear in mind about URNs: resolution is not a
requirement on URNs, only their registration with IANA, their syntax, and
the creation of an assignment method that ensures global uniqueness and
persistence.

But as I said, this is an issue that need not be addressed by this group.

  -- David
_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://www.dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________