Re: Baselines vs. labels

From: Geoffrey M. Clemm (geoffrey.clemm@rational.com)
Date: Tue, Jan 04 2000

  • Next message: Geoffrey M. Clemm: "Removing the "single line of descent" restriction on activities"

    Date: Tue, 4 Jan 2000 11:06:57 -0500
    Message-Id: <10001041606.AA17944@tantalum>
    From: "Geoffrey M. Clemm" <geoffrey.clemm@rational.com>
    To: ietf-dav-versioning@w3.org
    Subject: Re: Baselines vs. labels
    
    
       From: "Eric Sedlar" <esedlar@us.oracle.com>
    
       Can someone give a bit of rationale for when you use baselines and when
       you use a label applied recursively to all the elements of a
       collection?  Is a baseline just a specialized case of a label?
    
    One key difference between a baseline and a label is that you can
    ask a revision "what labels are on you" but you cannot ask a revision
    "what baselines include you".  This can have significant performance
    impact on certain implementation choices.
    
    Another key difference is a baseline is created by taking a "snapshot"
    of the state of a workspace, which can provide a very cheap implementation
    mechanism (i.e. based on the current value of the revision-selection-rule).
    A label is moved arbitrarily, so there is no revision-selection-rule
    optimization available.
    
    Another key difference is that a baseline captures the "history" of a
    set of revisions, i.e. you can find predecessor baselines and successor
    baselines.  In contrast, a label just captures the current state of
    a collection of revisions.  Where the label was in the past is lost.
    
    So a label and a baseline have very different properties (neither is
    a specialization of the other).  Which one you use depends on which
    of these properties are more important.
    
       When do you recommend using recursive labels vs. a baseline?  For
       typical configuration management needs, if I want to create release
       1.0.1.1 of my product, it seems like I might want the ability to slip
       in a new revision of a file at the last minute, which would mean
       moving a label from version to version.
    
    The fact that you have a 1.0.1.1 means you probably have a history
    captured by naming conventions on your labels.  With baselines, you
    have your history explicitly modeled via predecessor and successor
    properties (and therefore unlike naming conventions, interoperable
    between different clients and servers).  In a baseline system, you
    use "activities" to capture dynamic change (i.e. slipping in a new
    version), and only create new baselines when you have a set of revisions
    that you want to capture in the history.  You don't modify an
    existing baseline for the same reasons you don't modify an existing
    revision.
    
       I couldn't do that with a
       baseline, since I would need a new baseline, which would have a
       different URI, so this wouldn't be transparent to people who were
       already using release 1.0.1.1 of my product.  It might be useful to
       include some scenarios in the spec as to when to use either.
    
    You would have a "release-1.0.1.1" activity to share between folks
    that want to see the release-1.0.1.1 work in progress.  You would
    create a baseline only when there is a state of release-1.0.1.1 that
    you want to capture for history.
    
       Since version management systems like CVS use tags (i.e. labels) in
       this way, I think some clarifications in the spec in this area would
       be helpful.
    
    CVS uses labels to model both activities and baselines.  A label that
    is being moved forward along some line of descent is modeled as an
    activity.  A label that never moves is modeled as a baseline.
    
       From: "Eric Sedlar" <esedlar@us.oracle.com>
    
       It seems excessively complex to have three different ways to identify
       a set of revisions (for use in revision selection rules).  I don't see
       much utility for baselines if you can never change the revision of a
       particular file in a baseline.
    
    Baselines and activities are designed to be used together.
    Baselines capture the history of a set of related resources.
    Activities capture the changes that lead from one state of the
    history to the next.  If you want to see a particular state of
    the history, you place a baseline in your revision-selection-rule.
    If you want to see changes from that baseline, you add the appropriate
    activity (activities) to you your revision-selection-rule.
    
       It seems to me that the performance benefits of baselines are based on
       the fact that you have a contiguous subtree of revisions, where there
       is no need to check the revision selection rules when traversing a
       link (this often involves searching through the list of revisions in a
       versioned resource to find the latest one or a particular label,
       etc.).  Each collection revision in a baseline can point directly to
       the associated revision the next layer down.
    
    The key optimization is based on the fact that it is a snapshot of the
    state of a workspace.  This means that you can just capture the state
    of the revision-selection-rule property, and not scan the resources at
    all.
    
       What if you introduced a new concept like "baseline configuration".  A
       baseline configuration would be rooted at a particular versioned
       collection recursively, just like a baseline.  However, you would be
       allowed to change the revisions in a baseline configuration after
       creating the configuration.  Then you can get rid of baselines &
       configurations, and simplify the spec.
    
    I'd be happy to get rid of "configurations", and just keep baselines
    and labels.  But I would not be willing to get rid of baselines, since
    then the only way to capture the state of a set of versioned resources
    would be to enumerate the currently selected revisions, which would
    not scale.
    
       Is the reason you consider labels less "reliable" than configurations
       due to the assumption that you are protecting them with access control
       on a bunch of different resources rather than access control on a
       single resource?
    
    A label is just an XML element within a resource property.  It is
    very unlikely that we will define access control down to that level
    (i.e. this element in this property is read-only by this individual).
    This means that access control on labels is unlikely to ever be provided.
    
       Also, can an administrator rename baselines?  (E.g. I create a
       baseline from /amazon/catalogs/music at a particular point in time,
       and store it in "/baselines/amazon/catalogs/music/dec6_99.base".  Then
       I modify the revision selection rules in the workspace I have
       selected, and create a new baseline from /amazon/catalogs/music, which
       includes a different set of revisions, and call it
       "/baselines/amazon/catalogs/music/temp.base".  Can I delete the first
       baseline and rename the second one to have the same name as the first
       one, thus changing the selected revisions for anyone who has
       referenced "/baselines/amazon/catalogs/music/dec6_9.base" in their
       revision selection rules?
    
    A baselines is given an immutable name by the server, not a mutable
    name by the client (JimA has tried to argue otherwise, but we are
    vigourously resisting :-).  The main reason is that it is the immutability
    of a baseline that leads to a variety of optimizations (i.e. I can cache
    the baseline locally, and not have to keep going back to the server
    to see if it has "changed").
    
       From: "Eric Sedlar" <esedlar@us.oracle.com>
    
       1) To justify having a "baseline" concept in the spec, I think we need to have
           * a real customer scenario where the absolute guarantee a baseline will
       never be changed is necessary, and
    
    If I ship a release to a customer, I want to know what was in that release,
    with no if, ands, or buts.
    
           * show that this is a significant enough case to warrant the complexity
    
    If I can't reliably reproduce a shipped release, my attempts to reproduce
    and fix a customer problem are serverely hampered.
    
       2) Even if you can come up with 1), I would argue that the performance benefits
       of a baseline should be available to whatever mechanism is used to represent a
       release (currently a configuration)
    
    I would represent a release with a baseline.
    Performance benefits come from restrictions, not generalizations.
    In particular, the performance benefits of baselines over labels
    and configurations derive from the "snapshot the state of my workspace"
    characteristics of baselines.
    
       since I think that is going to be far more
       commonly used, hence why something like the "baseline configuration" concept
       I'm proposing might be a good idea.
    
    If you want the performance benefits provided by baselines, you will
    *only* use baselines and activities (i.e. not labels or general
    configurations).
    
       3) I don't see anything in the spec preventing any WebDAV resource (baselines,
       configurations, etc.) from being renamed by a user.  If you did, you would have
       to reserve the entire namespace above the location of the baseline or whatever,
       and make it appear as a read-only filesystem.
    
    We cannot prevent people with access to the server from administratively
    changing the URL's, but we can ensure that the protocol provides no means
    of doing so.  In particular, we can require that a MOVE on one of these
    special URL's (e.g. revision and baseline URL's) fail.
    
       From: jamsden@us.ibm.com
    
       ... I've never
       quite understood the use case for baselines either. ... As far as
       performance and optimization is concerned, a server is free to examine the
       contents of a configuration when it encounters it in a workspace revision
       selection rule, and based on its contents, perform any optimizations it
       wants.
    
    The optimization isn't at reference time, but at creation time.
    When you create a baseline, you can snapshot the state of a revision
    selection rule.  This is not feasible for a general configuration which
    can be created and modified out of the context of a workspace.
    
    Cheers,
    Geoff
    
    -- 
    Geoffrey M. Clemm
    Chief Engineer, Configuration Management Business Unit
    Rational Software Corporation
    (781) 676-2684   geoffrey.clemm@rational.com   http://www.rational.com