Next message: Geoffrey M. Clemm: "Removing the "single line of descent" restriction on activities"
Date: Tue, 4 Jan 2000 11:06:57 -0500
Message-Id: <10001041606.AA17944@tantalum>
From: "Geoffrey M. Clemm" <geoffrey.clemm@rational.com>
To: ietf-dav-versioning@w3.org
Subject: Re: Baselines vs. labels
From: "Eric Sedlar" <esedlar@us.oracle.com>
Can someone give a bit of rationale for when you use baselines and when
you use a label applied recursively to all the elements of a
collection? Is a baseline just a specialized case of a label?
One key difference between a baseline and a label is that you can
ask a revision "what labels are on you" but you cannot ask a revision
"what baselines include you". This can have significant performance
impact on certain implementation choices.
Another key difference is a baseline is created by taking a "snapshot"
of the state of a workspace, which can provide a very cheap implementation
mechanism (i.e. based on the current value of the revision-selection-rule).
A label is moved arbitrarily, so there is no revision-selection-rule
optimization available.
Another key difference is that a baseline captures the "history" of a
set of revisions, i.e. you can find predecessor baselines and successor
baselines. In contrast, a label just captures the current state of
a collection of revisions. Where the label was in the past is lost.
So a label and a baseline have very different properties (neither is
a specialization of the other). Which one you use depends on which
of these properties are more important.
When do you recommend using recursive labels vs. a baseline? For
typical configuration management needs, if I want to create release
1.0.1.1 of my product, it seems like I might want the ability to slip
in a new revision of a file at the last minute, which would mean
moving a label from version to version.
The fact that you have a 1.0.1.1 means you probably have a history
captured by naming conventions on your labels. With baselines, you
have your history explicitly modeled via predecessor and successor
properties (and therefore unlike naming conventions, interoperable
between different clients and servers). In a baseline system, you
use "activities" to capture dynamic change (i.e. slipping in a new
version), and only create new baselines when you have a set of revisions
that you want to capture in the history. You don't modify an
existing baseline for the same reasons you don't modify an existing
revision.
I couldn't do that with a
baseline, since I would need a new baseline, which would have a
different URI, so this wouldn't be transparent to people who were
already using release 1.0.1.1 of my product. It might be useful to
include some scenarios in the spec as to when to use either.
You would have a "release-1.0.1.1" activity to share between folks
that want to see the release-1.0.1.1 work in progress. You would
create a baseline only when there is a state of release-1.0.1.1 that
you want to capture for history.
Since version management systems like CVS use tags (i.e. labels) in
this way, I think some clarifications in the spec in this area would
be helpful.
CVS uses labels to model both activities and baselines. A label that
is being moved forward along some line of descent is modeled as an
activity. A label that never moves is modeled as a baseline.
From: "Eric Sedlar" <esedlar@us.oracle.com>
It seems excessively complex to have three different ways to identify
a set of revisions (for use in revision selection rules). I don't see
much utility for baselines if you can never change the revision of a
particular file in a baseline.
Baselines and activities are designed to be used together.
Baselines capture the history of a set of related resources.
Activities capture the changes that lead from one state of the
history to the next. If you want to see a particular state of
the history, you place a baseline in your revision-selection-rule.
If you want to see changes from that baseline, you add the appropriate
activity (activities) to you your revision-selection-rule.
It seems to me that the performance benefits of baselines are based on
the fact that you have a contiguous subtree of revisions, where there
is no need to check the revision selection rules when traversing a
link (this often involves searching through the list of revisions in a
versioned resource to find the latest one or a particular label,
etc.). Each collection revision in a baseline can point directly to
the associated revision the next layer down.
The key optimization is based on the fact that it is a snapshot of the
state of a workspace. This means that you can just capture the state
of the revision-selection-rule property, and not scan the resources at
all.
What if you introduced a new concept like "baseline configuration". A
baseline configuration would be rooted at a particular versioned
collection recursively, just like a baseline. However, you would be
allowed to change the revisions in a baseline configuration after
creating the configuration. Then you can get rid of baselines &
configurations, and simplify the spec.
I'd be happy to get rid of "configurations", and just keep baselines
and labels. But I would not be willing to get rid of baselines, since
then the only way to capture the state of a set of versioned resources
would be to enumerate the currently selected revisions, which would
not scale.
Is the reason you consider labels less "reliable" than configurations
due to the assumption that you are protecting them with access control
on a bunch of different resources rather than access control on a
single resource?
A label is just an XML element within a resource property. It is
very unlikely that we will define access control down to that level
(i.e. this element in this property is read-only by this individual).
This means that access control on labels is unlikely to ever be provided.
Also, can an administrator rename baselines? (E.g. I create a
baseline from /amazon/catalogs/music at a particular point in time,
and store it in "/baselines/amazon/catalogs/music/dec6_99.base". Then
I modify the revision selection rules in the workspace I have
selected, and create a new baseline from /amazon/catalogs/music, which
includes a different set of revisions, and call it
"/baselines/amazon/catalogs/music/temp.base". Can I delete the first
baseline and rename the second one to have the same name as the first
one, thus changing the selected revisions for anyone who has
referenced "/baselines/amazon/catalogs/music/dec6_9.base" in their
revision selection rules?
A baselines is given an immutable name by the server, not a mutable
name by the client (JimA has tried to argue otherwise, but we are
vigourously resisting :-). The main reason is that it is the immutability
of a baseline that leads to a variety of optimizations (i.e. I can cache
the baseline locally, and not have to keep going back to the server
to see if it has "changed").
From: "Eric Sedlar" <esedlar@us.oracle.com>
1) To justify having a "baseline" concept in the spec, I think we need to have
* a real customer scenario where the absolute guarantee a baseline will
never be changed is necessary, and
If I ship a release to a customer, I want to know what was in that release,
with no if, ands, or buts.
* show that this is a significant enough case to warrant the complexity
If I can't reliably reproduce a shipped release, my attempts to reproduce
and fix a customer problem are serverely hampered.
2) Even if you can come up with 1), I would argue that the performance benefits
of a baseline should be available to whatever mechanism is used to represent a
release (currently a configuration)
I would represent a release with a baseline.
Performance benefits come from restrictions, not generalizations.
In particular, the performance benefits of baselines over labels
and configurations derive from the "snapshot the state of my workspace"
characteristics of baselines.
since I think that is going to be far more
commonly used, hence why something like the "baseline configuration" concept
I'm proposing might be a good idea.
If you want the performance benefits provided by baselines, you will
*only* use baselines and activities (i.e. not labels or general
configurations).
3) I don't see anything in the spec preventing any WebDAV resource (baselines,
configurations, etc.) from being renamed by a user. If you did, you would have
to reserve the entire namespace above the location of the baseline or whatever,
and make it appear as a read-only filesystem.
We cannot prevent people with access to the server from administratively
changing the URL's, but we can ensure that the protocol provides no means
of doing so. In particular, we can require that a MOVE on one of these
special URL's (e.g. revision and baseline URL's) fail.
From: jamsden@us.ibm.com
... I've never
quite understood the use case for baselines either. ... As far as
performance and optimization is concerned, a server is free to examine the
contents of a configuration when it encounters it in a workspace revision
selection rule, and based on its contents, perform any optimizations it
wants.
The optimization isn't at reference time, but at creation time.
When you create a baseline, you can snapshot the state of a revision
selection rule. This is not feasible for a general configuration which
can be created and modified out of the context of a workspace.
Cheers,
Geoff
--
Geoffrey M. Clemm
Chief Engineer, Configuration Management Business Unit
Rational Software Corporation
(781) 676-2684 geoffrey.clemm@rational.com http://www.rational.com