Re: Baselines vs. labels

From: Eric Sedlar (esedlar@us.oracle.com)
Date: Sat, Jan 08 2000

  • Next message: Chris Kaler: "proposed versioning usage scenario for client-managed checkouts"

    Message-ID: <01b501bf5a10$9d300610$9a114498@us.oracle.com>
    From: "Eric Sedlar" <esedlar@us.oracle.com>
    To: "Geoffrey M. Clemm" <geoffrey.clemm@rational.com>, <ietf-dav-versioning@w3.org>
    Date: Sat, 8 Jan 2000 11:42:46 -0800
    Subject: Re: Baselines vs. labels
    
    Thanks for pointing out the caching benefit of baselines, and the way a
    shared activity can be used to modify them.  My biggest beef is not with
    baselines, but with the number of ways of seleting a set of revisions:
    
    * baseline
    * configuration
    * shared activity (with RSRs specifying the revisions)
    * label
    
    If you are willing to get rid of configurations, let's get rid of them.
    
    --Eric
    
    ----- Original Message -----
    From: "Geoffrey M. Clemm" <geoffrey.clemm@rational.com>
    To: <ietf-dav-versioning@w3.org>
    Sent: Tuesday, January 04, 2000 8:06 AM
    Subject: Re: Baselines vs. labels
    
    
    >
    >    From: "Eric Sedlar" <esedlar@us.oracle.com>
    >
    >    Can someone give a bit of rationale for when you use baselines and when
    >    you use a label applied recursively to all the elements of a
    >    collection?  Is a baseline just a specialized case of a label?
    >
    > One key difference between a baseline and a label is that you can
    > ask a revision "what labels are on you" but you cannot ask a revision
    > "what baselines include you".  This can have significant performance
    > impact on certain implementation choices.
    >
    > Another key difference is a baseline is created by taking a "snapshot"
    > of the state of a workspace, which can provide a very cheap implementation
    > mechanism (i.e. based on the current value of the
    revision-selection-rule).
    > A label is moved arbitrarily, so there is no revision-selection-rule
    > optimization available.
    >
    > Another key difference is that a baseline captures the "history" of a
    > set of revisions, i.e. you can find predecessor baselines and successor
    > baselines.  In contrast, a label just captures the current state of
    > a collection of revisions.  Where the label was in the past is lost.
    >
    > So a label and a baseline have very different properties (neither is
    > a specialization of the other).  Which one you use depends on which
    > of these properties are more important.
    >
    >    When do you recommend using recursive labels vs. a baseline?  For
    >    typical configuration management needs, if I want to create release
    >    1.0.1.1 of my product, it seems like I might want the ability to slip
    >    in a new revision of a file at the last minute, which would mean
    >    moving a label from version to version.
    >
    > The fact that you have a 1.0.1.1 means you probably have a history
    > captured by naming conventions on your labels.  With baselines, you
    > have your history explicitly modeled via predecessor and successor
    > properties (and therefore unlike naming conventions, interoperable
    > between different clients and servers).  In a baseline system, you
    > use "activities" to capture dynamic change (i.e. slipping in a new
    > version), and only create new baselines when you have a set of revisions
    > that you want to capture in the history.  You don't modify an
    > existing baseline for the same reasons you don't modify an existing
    > revision.
    >
    >    I couldn't do that with a
    >    baseline, since I would need a new baseline, which would have a
    >    different URI, so this wouldn't be transparent to people who were
    >    already using release 1.0.1.1 of my product.  It might be useful to
    >    include some scenarios in the spec as to when to use either.
    >
    > You would have a "release-1.0.1.1" activity to share between folks
    > that want to see the release-1.0.1.1 work in progress.  You would
    > create a baseline only when there is a state of release-1.0.1.1 that
    > you want to capture for history.
    >
    >    Since version management systems like CVS use tags (i.e. labels) in
    >    this way, I think some clarifications in the spec in this area would
    >    be helpful.
    >
    > CVS uses labels to model both activities and baselines.  A label that
    > is being moved forward along some line of descent is modeled as an
    > activity.  A label that never moves is modeled as a baseline.
    >
    >    From: "Eric Sedlar" <esedlar@us.oracle.com>
    >
    >    It seems excessively complex to have three different ways to identify
    >    a set of revisions (for use in revision selection rules).  I don't see
    >    much utility for baselines if you can never change the revision of a
    >    particular file in a baseline.
    >
    > Baselines and activities are designed to be used together.
    > Baselines capture the history of a set of related resources.
    > Activities capture the changes that lead from one state of the
    > history to the next.  If you want to see a particular state of
    > the history, you place a baseline in your revision-selection-rule.
    > If you want to see changes from that baseline, you add the appropriate
    > activity (activities) to you your revision-selection-rule.
    >
    >    It seems to me that the performance benefits of baselines are based on
    >    the fact that you have a contiguous subtree of revisions, where there
    >    is no need to check the revision selection rules when traversing a
    >    link (this often involves searching through the list of revisions in a
    >    versioned resource to find the latest one or a particular label,
    >    etc.).  Each collection revision in a baseline can point directly to
    >    the associated revision the next layer down.
    >
    > The key optimization is based on the fact that it is a snapshot of the
    > state of a workspace.  This means that you can just capture the state
    > of the revision-selection-rule property, and not scan the resources at
    > all.
    >
    >    What if you introduced a new concept like "baseline configuration".  A
    >    baseline configuration would be rooted at a particular versioned
    >    collection recursively, just like a baseline.  However, you would be
    >    allowed to change the revisions in a baseline configuration after
    >    creating the configuration.  Then you can get rid of baselines &
    >    configurations, and simplify the spec.
    >
    > I'd be happy to get rid of "configurations", and just keep baselines
    > and labels.  But I would not be willing to get rid of baselines, since
    > then the only way to capture the state of a set of versioned resources
    > would be to enumerate the currently selected revisions, which would
    > not scale.
    >
    >    Is the reason you consider labels less "reliable" than configurations
    >    due to the assumption that you are protecting them with access control
    >    on a bunch of different resources rather than access control on a
    >    single resource?
    >
    > A label is just an XML element within a resource property.  It is
    > very unlikely that we will define access control down to that level
    > (i.e. this element in this property is read-only by this individual).
    > This means that access control on labels is unlikely to ever be provided.
    >
    >    Also, can an administrator rename baselines?  (E.g. I create a
    >    baseline from /amazon/catalogs/music at a particular point in time,
    >    and store it in "/baselines/amazon/catalogs/music/dec6_99.base".  Then
    >    I modify the revision selection rules in the workspace I have
    >    selected, and create a new baseline from /amazon/catalogs/music, which
    >    includes a different set of revisions, and call it
    >    "/baselines/amazon/catalogs/music/temp.base".  Can I delete the first
    >    baseline and rename the second one to have the same name as the first
    >    one, thus changing the selected revisions for anyone who has
    >    referenced "/baselines/amazon/catalogs/music/dec6_9.base" in their
    >    revision selection rules?
    >
    > A baselines is given an immutable name by the server, not a mutable
    > name by the client (JimA has tried to argue otherwise, but we are
    > vigourously resisting :-).  The main reason is that it is the immutability
    > of a baseline that leads to a variety of optimizations (i.e. I can cache
    > the baseline locally, and not have to keep going back to the server
    > to see if it has "changed").
    >
    >    From: "Eric Sedlar" <esedlar@us.oracle.com>
    >
    >    1) To justify having a "baseline" concept in the spec, I think we need
    to have
    >        * a real customer scenario where the absolute guarantee a baseline
    will
    >    never be changed is necessary, and
    >
    > If I ship a release to a customer, I want to know what was in that
    release,
    > with no if, ands, or buts.
    >
    >        * show that this is a significant enough case to warrant the
    complexity
    >
    > If I can't reliably reproduce a shipped release, my attempts to reproduce
    > and fix a customer problem are serverely hampered.
    >
    >    2) Even if you can come up with 1), I would argue that the performance
    benefits
    >    of a baseline should be available to whatever mechanism is used to
    represent a
    >    release (currently a configuration)
    >
    > I would represent a release with a baseline.
    > Performance benefits come from restrictions, not generalizations.
    > In particular, the performance benefits of baselines over labels
    > and configurations derive from the "snapshot the state of my workspace"
    > characteristics of baselines.
    >
    >    since I think that is going to be far more
    >    commonly used, hence why something like the "baseline configuration"
    concept
    >    I'm proposing might be a good idea.
    >
    > If you want the performance benefits provided by baselines, you will
    > *only* use baselines and activities (i.e. not labels or general
    > configurations).
    >
    >    3) I don't see anything in the spec preventing any WebDAV resource
    (baselines,
    >    configurations, etc.) from being renamed by a user.  If you did, you
    would have
    >    to reserve the entire namespace above the location of the baseline or
    whatever,
    >    and make it appear as a read-only filesystem.
    >
    > We cannot prevent people with access to the server from administratively
    > changing the URL's, but we can ensure that the protocol provides no means
    > of doing so.  In particular, we can require that a MOVE on one of these
    > special URL's (e.g. revision and baseline URL's) fail.
    >
    >    From: jamsden@us.ibm.com
    >
    >    ... I've never
    >    quite understood the use case for baselines either. ... As far as
    >    performance and optimization is concerned, a server is free to examine
    the
    >    contents of a configuration when it encounters it in a workspace
    revision
    >    selection rule, and based on its contents, perform any optimizations it
    >    wants.
    >
    > The optimization isn't at reference time, but at creation time.
    > When you create a baseline, you can snapshot the state of a revision
    > selection rule.  This is not feasible for a general configuration which
    > can be created and modified out of the context of a workspace.
    >
    > Cheers,
    > Geoff
    >
    > --
    > Geoffrey M. Clemm
    > Chief Engineer, Configuration Management Business Unit
    > Rational Software Corporation
    > (781) 676-2684   geoffrey.clemm@rational.com   http://www.rational.com
    >
    >