Re: Notes from 5/3/99 Versioning TeleConf

Geoffrey M. Clemm (gclemm@tantalum.atria.com)
Tue, 4 May 1999 23:00:42 -0400


Date: Tue, 4 May 1999 23:00:42 -0400
Message-Id: <9905050300.AA07478@tantalum>
From: "Geoffrey M. Clemm" <gclemm@tantalum.atria.com>
To: jamsden@us.ibm.com
Cc: ietf-dav-versioning@w3.org
In-Reply-To: <85256767.004F8217.00@d54mta03.raleigh.ibm.com>
Subject: Re: Notes from 5/3/99 Versioning TeleConf


   From: jamsden@us.ibm.com

   > "Geoffrey M. Clemm" <gclemm@tantalum.atria.com> on 05/03/99 04:39:58 PM
   > Jim Amsden suggested that we replace the DAV:scope property with a
   > collection that is bound to the versioned-resources that would have
   > been listed in the DAV:scope property.

   <jra>
   What I was suggesting is that the collection members are what was
   listed in the DAV:scope property.

That sounds like you agree with what the notes say here?

   > Jim further suggested that the configuration revision actually *be*
   > this collection.  I'm a little more hesitant about that, since I'm
   > concerned that there will be other "collections" that we will want to
   > associate with a configuration revision.

   <jra>
   I think we should be free to define whatever specializations of
   collection are needed to provide the functions we want. Having a
   configuration be a kind of collection isn't really a restriction, it
   just reuses collection semantics to manage configuration members.

Certainly defining specializations of collection is a sensible approach.
My point was just that if there is more than one collection that defines
the state of a snapshot, then you need to make these be *subcollections*
of a snapshot.  In particular, there is the collection of versioned
resources that define the "scope" of the snapshot (let's call that the
"scope collection", and then the collection of revisions that is
created when you "checkin" the snapshot (let's call that the
"selection collection").

   > Jim Amsden suggested that we just allow a list of revisions to be
   > specified when creating the configuration revision.  My concern with
   > this approach is that the server should confirm that the list of
   > revisions is a "legal" configuration (e.g. that it specifies at most
   > one revision of each versioned-resource, and that it selects a
   > revision of each internal member of a collection revision), and that
   > many servers will need a workspace to perform this computation efficiently.

   <jra>
   Yes, a server must support collection and configuration semantics when
   a member is added to a configuration. These are:
     - can't have the same member twice

I assume you mean "can't have two revisions of the same versioned-resource"?
The term "member" is ambiguous, since a versioned-resource is a member of the
"scope collection" of a snapshot, while a revision is a member of the 
"selection collection" of a snapshot.

     - member must exist and specify either directly through a label,
     or indirectly through a workspace an existing revision.

If we simply say "the revisions are specified in a workspace", all that a
client that wants to use labels needs to do is to allocate a workspace
in the beginning of a session, specify that label as the RSR, and then
CHECKOUT/CHECKIN the configuration every time a snapshot is desired.

A low end server will just read the label from the RSR, and make the snapshot.
A high end server will use the cached information in the workspace to make
the snapshot more efficiently.  Is this a problem?

     - adding a collection recursively adds all its members (subject
     to the same rules)

Adding revisions of all 10,000 resources of a website to a snapshot is
the reason why I need the caching opportunity provided by a workspace.

     - member must be an immutable revision. There is some question
     on this one.

Perhaps you could explain the benefit of this restriction?

   These rules don't seem that hard to check incrementally on each edit to a
   configuration. There is really nothing else a server can do in terms of
   validating the consistency of the selected revisions. Only the client can know
   through testing if the right revisions have been selected.

They are hard to check efficiently, once you get large snapshots.

   > An alternative approach is to let the client specify a label to be used
   > to pick revisions that should go into the configuration revision, but
   > this requires the client to actually put a label on all revisions to
   > go in the configuration, and requires the server to scan every
   > versioned-resource in the configuration to see what revision the
   > label would select.

   There's no choice if workspaces aren't available.

How do you mean "not be available"?  If you ever want to create a working
resource, some form of a workspace will be available.  This is just another
case where they could be required by the protocol.  Remember, a workspace
is just a URL with an RSR property.  This is *not* a heavyweight concept.

If a level-1 client restricts the RSR of a workspace to be just those
revision selection constructs it supports (e.g. labels and configurations),
then using a workspace to indirectly name a label or a configuration is
very minimal overhead since the same workspace can be used for the
entire session.

   Clearly configurations are a more advanced concept in DAV
   versioning. But as has been pointed out a number of times on the
   mailing list, simple versioning may be simple for the server, but it
   won't be so simple for clients.

I'm not sure how that relates to the issue at hand.

   Configurations may be very useful even
   in servers that don't support workspaces or activities.

All versioning servers are required to support workspaces, minimally
for working resources and for the default workspace.  I'd like to
see an explanation of why requiring them for the "make-snapshot"
operation is any more burdensome than requiring them to be returned
on the CHECKOUT operation.

   They aren't
   just a revision selector in a workspace revision selection rule, but
   may also be used:

     - to query their members to see what revision made up a consistent set
     - for deployment of a web application
     - web publishing
     - archiving
     - site organization

Most of these operations require access to the human meaningful names,
and those are only available when you load the snapshot into a
workspace.  In any case, how hard is it for a client to set the RSR of
a workspace to be that configuration, and then just indirect through
that workspace to get the desired functionality?  For a low-end client
that is willing to maintain a table of "checkout-tokens" and possibly
"lock-tokens", I can't see that the allocation of a single workspace
for configuration creation and selection is much to ask.  After all,
the server is free to just implement the workspace as a no-op, and
just read the RSR for the arguments to the "make-snapshot" command.

   I think of configurations as providing a "single version" view of a
   multi-versioned space. This is what most users will want most of the time, even
   if the server only supports simple versioning for creating the various
   revisions. Those servers will likely require a publish step that copies specific
   revisions to some other namespace for production access. Configurations
   eliminate the need for this publish step.

And how does the existence of a snapshot avoid the need to perform a
"copy" or "publish" to get the contents of a configuration to another
namespace?

   > A repository is a collection which contains at least four standard
   > subcollections, named "activity", "versioned-resource",
   > "configuration", and "workspace".  A member of the versioned-resource
   > or configuration collections would have a server-generated
   > versioned-resource-id as its name, while an activity or workspace
   > would have a client specified name.

   Configurations should not have server-generated names. These are user
   created and controlled resources whose meaning is completely user
   defined. I don't see the need for a repository to have a special
   mechanism for querying configurations. A DASL search should be
   adequate.

Good catch!  I intended to say that configurations should have human
specified names, just as activities and workspaces do.  That's the
reason they aren't just part of the "versioned-resources" subcollection.

Cheers,
Geoff