Re: configurations and all that...

Geoffrey M. Clemm (gclemm@tantalum.atria.com)
Wed, 7 Apr 1999 10:59:26 -0400


Date: Wed, 7 Apr 1999 10:59:26 -0400
Message-Id: <9904071459.AA28488@tantalum>
From: "Geoffrey M. Clemm" <gclemm@tantalum.atria.com>
To: Jeff_McAffer@oti.com
Cc: ietf-dav-versioning@w3.org
In-Reply-To: <1999Apr06.200300.1250.1136829@otismtp.ott.oti.com>
Subject: Re: configurations and all that...


   From: Jeff_McAffer@oti.com (Jeff McAffer OTT)

   I would like to present what might be a different perspective on   
   configurations.  Basically our cut on this (uhh, what we want) is that   
   configurations are "deep revisions of collections".

I agree.  There have been proposals that configurations be independent
of (or orthogonal to) collections, but I believe that alligning the
notion of configuration with the notion of a configuration in the
way Jeff proposes (i.e. a configuration is a deep revision of a collection)
produces a significantly simpler protocol.

   Revisions of non-collection resources give users a recoverable immutable   
   state for that resource.  Revisioning a collection gets you a shallow   
   (one level) fixing of content (i.e., the immediate member set is fixed   
   but not their revisions or contents).  Configurations were introduced (I   
   believe) to recognize the need for generating recoverable states   
   including collections.

Yes.

   It appears that their origin was as a freezing of the state of a   
   workspace such that whatever revision would have been selected at that   
   moment was captured by the configuration.  Recognizing that people might   
   not want to capture their entire workspace, there has been talk about   
   operating on workspaces and scoping the operation to narrow the group of   
   resources captured.

Actually, configurations were originally introduced as a way of saying
"here's an immutable set of revisions".  Saying that this set is based
on the state of the workspace (as opposed to just a set that is
directly manipulated by the client) came later, and is a current topic
of discussion.

   I get the feeling that while we acknowledge that   
   configurations are not necessarily "the entire world" of immutable   
   resources, people think of them as workspace things and assume they are   
   relatively rare.  Certainly it is assumed there are few of them in the   
   RSRs.

Those are two very separate statements.  Some of us believe that configuration
revisions will be created frequently (i.e. possibly several per day per
client).  So it becomes critical that a configuration revision be amenable
to very efficient implementation.  But you only need one of these "historical"
configuration revisions in your RSR at any one time (just as your workspace
selects a single revision of a versioned-resource at any one time).  So you
can have lots of configuration-revisions while only having a few of them in
your RSR at any one time.

Another reason for needing lots of configuration-revisions in an RSR would
be if there was no way of creating a configuration-revision that is
composed of a set of child configuration-revisions.  I believe it is
essential that we provide such a composition mechanism, and that with
such a composition mechanism, we reduce the need for a large number of
*explicit* configuration-revisions in an RSR (although composite configuration
revisions *implicitly* add an arbitrarily large number of configuration
revisions to the RSR).

   Let's turn the table a little and focus on the user view.  Users have   
   (potentially numerous and deep) collections of resource revisions   
   identified by workspace RSRs and they want to capture them (perhaps   
   independently) for later reuse.  They might have all manner of stuff in   
   their workspaces.  Some of it ready to go to production, some just   
   starting prototyping.  The workspace is not the focus, the collections of   
   resources are.

Depends what you mean by "focus".  It is certainly the lens through which
the huge mass of historical revisions are filtered to produce the set that
you want to see and manipulate.  In addition, the workspace is the mechanism
with which you can create and name "working resources".

But I certainly agree that the artifacts *in* the workspace (i.e. the
resources, collection and non-collection) are what the user is focusing on.

   The workspace is the view onto, or context for, the   
   resources (i.e., specs revisions via RSRs) but that's it.

Although that is a pretty important "it" (:-).  In particular, although
a workspace with no resources displayed in it is pretty useless, a
mass of versioned resource revisions is also pretty useless, unless you
select an interesting slice through them (which is what a workspace does).

   Looking at it this way, its natural the to talk about revisioning these   
   collections with depth infinity (i.e., "deep revisioning").

I agree.

   This is, I   
   believe, the operation everyone has been talking about and calling   
   "snapshot the workspace within some scope"?

Not quite.  There is the contingent that wants to just talk about a
set of revisions, independent of their membership in any collection.
All contingents agree that there must be some mechanism that selects
which revision of a versioned-resource that goes into the configuration,
and that "what is currently in the workspace" is one such mechanism.
The disagreement occurs on whether it is the only mechanism.

   The distinction may appear   
   subtle but it simplifies the explanation of the semantics.

I agree.

   People   
   understand collections.  They understand deep and shallow.   Other WebDAV   
   people have been working hard on collection semantics.  I suspect that   
   versioning will have many of the same issues.  It would be great if we   
   could derive our semantics from theirs so we appear as a simple variation   
   (if at all).

I agree.  Although it's actually a two-way street, since we're using
the needs of versioning to help motivate design choices in the advanced
collection work (both Jim Whitehead and I are on the advanced collection
design team).

   Thought:  Is "snapshot" = "checkin a collection with depth infinity"?
       The collection you check-in is the scope (references the roots
       of the collections of interest).  The deep check-in updates the
       collection to contain all the mappings...

That's how I look at it, although I would have said "The deep check-in
creates a configuration-revision that contains all the mappings".

   In the latest conference call there was discussion about scoping and   
   building a collection of scope patterns or starting points to use when   
   snapshotting.  Geoff has introduced DAV:versioned-collection.  I'm not   
   sure but this mixed with the scope ideas looks to me like the collection   
   we are deep versioning!

Yes, that was exactly my intent.

   Check it in (or pass it to the snapshot method)   
   and have it is updated to record the correct revisions of the referenced   
   resources and their children.

   Think of the snapshot operation as "updating a deep revision of a   
   collection (ie., a configuration)".
     - pass in one indicating just the "roots" or "scope" and you get a full   

       snapshot (within that scope).
     - pass in one from a previous snapshot and you get an incremental   
   snapshot.
       (i.e., it is updated).  This goes part way to addressing the issue of   

       incremental updates of configurations etc.  I would guess that many   
   servers
       could do interesting optimizations given the previous state and the
       current state.
     - pass in one with just / (the global root) and you get a snapshot of   
   the
       whole workspace.

Yes to everything above.  Requiring that configurations be revisions
(which means all but the first have a "predecessor") and requiring that
they be a snapshot of an RSR, gives the server the opportunity to do
exactly those kinds of optimizations that will make configurations a
scalable effective mechanism.  If either of these constraints are removed
(i.e. if we allow clients to arbitrarily put things in and out of 
configurations, or if configurations don't have a predecessor), these
critical optimizations become infeasible (at least in an interoperable way).

   Question: How do I ship a component to someone and retain revision info   
   etc.?

I assume by "shipping a component", you mean "shipping them all the
revisions in a particular configuration revision of that component".  In this
case, it is sufficient to ship the URL of that configuration revision.

   On the issue of lots of configs in RSRs, lets assume that an individual   
   user decided to try creating a small set of configs for the purposes of   
   limiting the RSR count.  To do this, he creates a bunch of "super   
   configurations" which specify "needed configurations" (we call them   
   "prerequisites" or "required-maps").  This creates some prereq DAG rooted   
   with a few super-configs.  Those configs are put in the RSRs.  BTW,   
   everything had to be checked in (i.e., revisioned), as I understand it,   
   for the RSRs to work properly?

Depends what you mean by "work properly".  A configuration revision can only
contain revisions, so resources would have to be checked in for them to
be part of a configuration revision.  Your RSR would probably be something
like: "my-activity ELSE my-super-config-revision".  This means that you
see things checked out to your workspace (always true for a workspace),
else revisions that are the products of "my-activity", else the revision
selected by my-super-config-revision.

   Assume I have an RSR which refers to a revision of config C.  Consider   
   what happens when I create a new revision of some resource A which is   
   included in some config X.  If I want to avoid RSR hacking, I have to   
   update (i.e., create a new revision of) X.  X is "needed" by Y.  So I   
   crack open Y and update and revision it.  Y is "needed" by Z, ...  is   
   needed by C.  I sure hope configurations have a lightweight   
   implementation (in both speed and space).

You were neglecting the "activity" concept.  Activities are there to capture
a logical change.  Activities can have "needed-activities" to allow you
to specify a group of logical changes that combine to form a larger logical
change.

Configurations are there to snapshot the state of the world after some
number of activities have resulted in a state of the world worth
snapshotting.  So although I believe configurations must be lightweight,
they don't have to be as lightweight as an activity (which must be
*really* lightweight, since it captures the smallest increments of change).

   Note that this is roughly what we do now.  Manually!  It sucks.  It is so   
   hard to manage direct revision dependencies like this that we had to   
   remove some of the prerequisite identification requirements as a matter   
   of practicality.  Our saving grace is that currently the environment   
   allows us to spec and share unrevisioned configurations in our equivalent   
   of RSRs so we don't have to do the revisioning very often (typically once   
   a release cycle).

Look hard at the activity mechanism.  It should address the problems you
raise.

   An automated mechanism would not be much better due to the revision   
   bloat.  Revisions should be interesting user checkpoints.  A change to a   
   dependee is only interesting when the dependent says it is.  If you force   
   revisioning up the dependent chain, you end up with an explosion of   
   useless revisions.  Manually/independently revisioning 5 prerequisites of   
   Z should not create 5 different revisions of Z.  Who is going to name all   
   these revisions?  How are users going to manage/understand/find out which   
   revisions are which?  Why is a small change at a low level so immediately   
   and obviously forced on people at the high level?

Again, look at activities.  An activity allows you to go in and create a
new revision of anything, up and down your collection hierarchy, without
requiring a change to anything else.  Only when you are done with a set
of activities, do you need to become involved in configuration creation.

   A reasonable alternative is to have the root super-config (i.e., C) in   
   the RSRs and then add the new revision of X to the RSRs such that it   
   overrides C.  This should work well but it leads us back to the beginning   
   in that we may well have lots of configs spec'd in workspace RSRs.   
    Everytime I revision something in a different component (or whatever I   
   define as my finest grained, deeply versioned collection) I add to the   
   RSRs.

That root super-config would be an activity.  Unlike a config, and activity
just contains the revisions created to perform that activity, and in fact
can contain multiple revisions of the same versioned-resource, in case
you didn't get it right the first time.

   ***NOTE:  Users are going to define this granularity.  For some, the   
   collections they want to deep revision contain whole websites and they   
   will have only one collection, for others they contain one part of one   
   component and they have thousands.  It is whatever makes sense for the   
   user's domain.  We would do well to not make too many assumptions about   
   this.

Yup.  There are no constraints on where the revisions in an activity
can occur.

   I don't see having lots of configs in the RSRs as a problem.  One can see   
   a number of ways for servers to optimize the config searching to make   
   this a non-issue.

Most of the more powerful config-spec elements (activities, configurations)
require some kind of lookup against the contents of that element, and that
will always be somewhat costly, especially for an element that is susceptible
to change (i.e. and activity) or can be very large (i.e. a configuration).

   Prerequisites (i.e., needed configs) are useful ways   
   for users to group/reuse logically coherent resource sets but BEWARE!   
    Maintaining these dependencies is a NON-TRIVIAL amount of work for the   
   user.  Further, users should not be creating these to satisfy the system   
   (i.e., webdav) but to help them solve their problems.  This structure may   
   or may not fit into some nice hierarchy.  I have no problem with having   
   this capability in WebDAV but we will not use it (much) and any problem   
   that is solved only by using "needed configs" is not solved for us.

The "needed" configurations are not a general traceability links.  They
are a very simple RSR "composition" relationship.  Their semantics is only
that if you explicitly specify an configuration revision in an RSR,
you implicitly select all its "needed" configurations.  If you explicitly
specify an activity in an RSR, you implicitly select all its "needed"
activities.

A composition relation is essential if you are going to scale, i.e. if
you have thousands of objects, you need some intermediate "clumps" so that
at any level of abstraction, you can just deal with "dozens".

   Anyway, this has gone on long enough.  The summary is that by changing   
   the focus a bit to; see the problem as "how do I deep revision   
   collections of resources", assume that there can be many many of these   
   deep revisions, and phrase RSRs in terms of these deep revisions, we can   
   leverage the user's understanding of collections as well as the work done   
   by the collection people, solve more problems and end up with a model   
   that is powerful and flexible without too much strain.

Amen.

Cheers,
Geoff