Date: Tue, 4 Jan 2000 11:06:57 -0500 Message-Id: <10001041606.AA17944@tantalum> From: "Geoffrey M. Clemm" <geoffrey.clemm@rational.com> To: ietf-dav-versioning@w3.org Subject: Re: Baselines vs. labels From: "Eric Sedlar" <esedlar@us.oracle.com> Can someone give a bit of rationale for when you use baselines and when you use a label applied recursively to all the elements of a collection? Is a baseline just a specialized case of a label? One key difference between a baseline and a label is that you can ask a revision "what labels are on you" but you cannot ask a revision "what baselines include you". This can have significant performance impact on certain implementation choices. Another key difference is a baseline is created by taking a "snapshot" of the state of a workspace, which can provide a very cheap implementation mechanism (i.e. based on the current value of the revision-selection-rule). A label is moved arbitrarily, so there is no revision-selection-rule optimization available. Another key difference is that a baseline captures the "history" of a set of revisions, i.e. you can find predecessor baselines and successor baselines. In contrast, a label just captures the current state of a collection of revisions. Where the label was in the past is lost. So a label and a baseline have very different properties (neither is a specialization of the other). Which one you use depends on which of these properties are more important. When do you recommend using recursive labels vs. a baseline? For typical configuration management needs, if I want to create release 1.0.1.1 of my product, it seems like I might want the ability to slip in a new revision of a file at the last minute, which would mean moving a label from version to version. The fact that you have a 1.0.1.1 means you probably have a history captured by naming conventions on your labels. With baselines, you have your history explicitly modeled via predecessor and successor properties (and therefore unlike naming conventions, interoperable between different clients and servers). In a baseline system, you use "activities" to capture dynamic change (i.e. slipping in a new version), and only create new baselines when you have a set of revisions that you want to capture in the history. You don't modify an existing baseline for the same reasons you don't modify an existing revision. I couldn't do that with a baseline, since I would need a new baseline, which would have a different URI, so this wouldn't be transparent to people who were already using release 1.0.1.1 of my product. It might be useful to include some scenarios in the spec as to when to use either. You would have a "release-1.0.1.1" activity to share between folks that want to see the release-1.0.1.1 work in progress. You would create a baseline only when there is a state of release-1.0.1.1 that you want to capture for history. Since version management systems like CVS use tags (i.e. labels) in this way, I think some clarifications in the spec in this area would be helpful. CVS uses labels to model both activities and baselines. A label that is being moved forward along some line of descent is modeled as an activity. A label that never moves is modeled as a baseline. From: "Eric Sedlar" <esedlar@us.oracle.com> It seems excessively complex to have three different ways to identify a set of revisions (for use in revision selection rules). I don't see much utility for baselines if you can never change the revision of a particular file in a baseline. Baselines and activities are designed to be used together. Baselines capture the history of a set of related resources. Activities capture the changes that lead from one state of the history to the next. If you want to see a particular state of the history, you place a baseline in your revision-selection-rule. If you want to see changes from that baseline, you add the appropriate activity (activities) to you your revision-selection-rule. It seems to me that the performance benefits of baselines are based on the fact that you have a contiguous subtree of revisions, where there is no need to check the revision selection rules when traversing a link (this often involves searching through the list of revisions in a versioned resource to find the latest one or a particular label, etc.). Each collection revision in a baseline can point directly to the associated revision the next layer down. The key optimization is based on the fact that it is a snapshot of the state of a workspace. This means that you can just capture the state of the revision-selection-rule property, and not scan the resources at all. What if you introduced a new concept like "baseline configuration". A baseline configuration would be rooted at a particular versioned collection recursively, just like a baseline. However, you would be allowed to change the revisions in a baseline configuration after creating the configuration. Then you can get rid of baselines & configurations, and simplify the spec. I'd be happy to get rid of "configurations", and just keep baselines and labels. But I would not be willing to get rid of baselines, since then the only way to capture the state of a set of versioned resources would be to enumerate the currently selected revisions, which would not scale. Is the reason you consider labels less "reliable" than configurations due to the assumption that you are protecting them with access control on a bunch of different resources rather than access control on a single resource? A label is just an XML element within a resource property. It is very unlikely that we will define access control down to that level (i.e. this element in this property is read-only by this individual). This means that access control on labels is unlikely to ever be provided. Also, can an administrator rename baselines? (E.g. I create a baseline from /amazon/catalogs/music at a particular point in time, and store it in "/baselines/amazon/catalogs/music/dec6_99.base". Then I modify the revision selection rules in the workspace I have selected, and create a new baseline from /amazon/catalogs/music, which includes a different set of revisions, and call it "/baselines/amazon/catalogs/music/temp.base". Can I delete the first baseline and rename the second one to have the same name as the first one, thus changing the selected revisions for anyone who has referenced "/baselines/amazon/catalogs/music/dec6_9.base" in their revision selection rules? A baselines is given an immutable name by the server, not a mutable name by the client (JimA has tried to argue otherwise, but we are vigourously resisting :-). The main reason is that it is the immutability of a baseline that leads to a variety of optimizations (i.e. I can cache the baseline locally, and not have to keep going back to the server to see if it has "changed"). From: "Eric Sedlar" <esedlar@us.oracle.com> 1) To justify having a "baseline" concept in the spec, I think we need to have * a real customer scenario where the absolute guarantee a baseline will never be changed is necessary, and If I ship a release to a customer, I want to know what was in that release, with no if, ands, or buts. * show that this is a significant enough case to warrant the complexity If I can't reliably reproduce a shipped release, my attempts to reproduce and fix a customer problem are serverely hampered. 2) Even if you can come up with 1), I would argue that the performance benefits of a baseline should be available to whatever mechanism is used to represent a release (currently a configuration) I would represent a release with a baseline. Performance benefits come from restrictions, not generalizations. In particular, the performance benefits of baselines over labels and configurations derive from the "snapshot the state of my workspace" characteristics of baselines. since I think that is going to be far more commonly used, hence why something like the "baseline configuration" concept I'm proposing might be a good idea. If you want the performance benefits provided by baselines, you will *only* use baselines and activities (i.e. not labels or general configurations). 3) I don't see anything in the spec preventing any WebDAV resource (baselines, configurations, etc.) from being renamed by a user. If you did, you would have to reserve the entire namespace above the location of the baseline or whatever, and make it appear as a read-only filesystem. We cannot prevent people with access to the server from administratively changing the URL's, but we can ensure that the protocol provides no means of doing so. In particular, we can require that a MOVE on one of these special URL's (e.g. revision and baseline URL's) fail. From: jamsden@us.ibm.com ... I've never quite understood the use case for baselines either. ... As far as performance and optimization is concerned, a server is free to examine the contents of a configuration when it encounters it in a workspace revision selection rule, and based on its contents, perform any optimizations it wants. The optimization isn't at reference time, but at creation time. When you create a baseline, you can snapshot the state of a revision selection rule. This is not feasible for a general configuration which can be created and modified out of the context of a workspace. Cheers, Geoff -- Geoffrey M. Clemm Chief Engineer, Configuration Management Business Unit Rational Software Corporation (781) 676-2684 geoffrey.clemm@rational.com http://www.rational.com