Versioning Example -- a port from Babich, Alan on 1998-07-24 (w3c-dist-auth@w3.org from July to September 1998)

From: Babich, Alan <ABabich@filenet.com>
Date: Fri, 24 Jul 1998 16:44:47 -0700
To: "'ejw@ics.uci.edu'" <ejw@ics.uci.edu>, John Stracke <francis@netscape.com>, Chris Kaler <ckaler@microsoft.com>, Bradley Sergeant <bradley_sergeant@intersolv.com>, Alan Babich <ABabich@felix.filenet.com>, Sam Ruby <rubys@us.ibm.com>, Bruce Cragun <Cragun.Bruce@gw.novell.com>, David Durand <dgd@cs.bu.edu>, Sridhar Iyengar <sridhar.iyengar@mv.unisys.com>
Cc: Alex Hopmann <alexhop@microsoft.com>, "'webdav'" <w3c-dist-auth@w3.org>
Message-ID: <72B1992276A9D111A20E00805FEAC96D01324C9B@cm-expo1.filenet.com>
> a) Prepare at least one (and ideally many more than one) 
> scenario.  Please
> email it out to the rest of the design team before the meeting (by the
> 6th) -- you should send it to the general WebDAV mailing list as well.

OK, Jim, as per your request, here's a scenario from the real world. 
WARNING: This is a long e-mail (about 345 lines).

A software company is porting several hundred thousands
of lines of C code to several different platforms. The
company intends to have exactly one source base that
handles all platforms. Therefore, there will be #ifdef's
in some of the source files that control selection of
text for platform dependent stuff like I/O.

There are two field releases being supported as
linear lines of development, and the current development
release is being supported as a linear line of development.

The field release lines of development branch off
the development release line of development when a change 
is made to the development release but not a field release.

First, let's consider how we start out. Then we will
consider what happens during the port, i.e., parallel editing.

Consider one source file x.c . It started out:

1.1 --> 1.2 --> 1.3

Those are the good old RCS version labels. (First number
is number of times the same node was branched. Second
number is consecutive linear line of development change
number.) We attach user version labels, because we don't 
care about no stinking RCS version labels. :-) They are
irrelevant to us humans. So, we invent a convention
where our user version labels are of the form

r<major release number>_<minor release number>_<build number>_
    <change number for the major/minor release>

User version labels are a necessity in order to be
able to recreate or initially create any base level of 
any release. In order to do so, you have to select a
whole collection of files, and the exact versions
you need must be specified in a simple way, e.g.,
release number and build level.
(For example, in order to do that for build 7 of release 1.0, 
you merely check out a read only copy of the version of every 
file that has the version label of the form r1_0_7_x where x is 
maximal. There is no possibility of such a simple algorithm
against the hardwired RCS version labels.)

The layer on top of sccs puts the user version labels on 
automatically. So, the version structure for x.c is actually

1.1       --> 1.2     --> 1.3
r1_0_0_0      r1_0_1_1    r1_0_2_2

1.1 was the initial version of x.c for build 0 of release 1.0. 
1.2 was change 1 for build 1 of release 1.0. 
1.3 was change 2 for build 2 of release 1.0.

OK. So now we release 1.0 to the field and start work on 
release 1.1. Nothing happens until we change x.c .

There are two possibilities. Either we are making a pure
new development change, or are fixing a bug in the field,
and we want that exact same fix in the development release.

First, we fix a bug and the fix is exactly the same (and
x.c is exactly the same) in the development release 
(release 1.1) for build 3 of the field release, and build 0 
of the development release. The change is automatically
"rolled forward" by the tools by simply putting on multiple
version labels on the new file. This can optionally be
done only when checking out and in the tip of a line of
development.

1.1       --> 1.2     --> 1.3     --> 1.4
r1_0_0_0      r1_0_1_1    r1_0_2_2    r1_0_3_3
                                      r1_1_0_0

Next, we put in a piece of pure new development for build 8
of the development release (i.e., 1.1).

1.1       --> 1.2     --> 1.3     --> 1.4      --> 1.4
r1_0_0_0      r1_0_1_1   r1_0_2_2     r1_0_3_3     r1_1_8_0
                                      r1_1_0_0

Next, we fix a bug in the field release (i.e., 1.0) build 7. 
The change can not "roll forward" to the development release, 
because we have made a development only change. In other words,
we are now checking out (and back in) a node that is not at
the tip of the line of development. This causes the
version tree to branch.

1.1       --> 1.2     --> 1.3     --> 1.4      --> 1.5
r1_0_0_0      r1_0_1_1   r1_0_2_2     r1_0_2_3     r1_1_8_1
                                      r1_1_0_0
                                      | 
                                      v
                                      1.4.1.1
                                      r1_0_7_4

Then we fix another bug for build 9 of the field release (i.e., 1.0).
 
1.1       --> 1.2     --> 1.3     --> 1.4      --> 1.5
r1_0_0_0      r1_0_1_1   r1_0_2_2     r1_0_2_3     r1_1_8_1
                                      r1_1_0_0
                                      | 
                                      v
                                      1.4.1.1  --> 1.4.1.2
                                      r1_0_7_4     r1_0_9_5

It should be clear how the lines of development progress from here.
Note that the development release is always the main trunk
(i.e., has RCS version numbers of the form "1.x".)

If we ever check out and in node 1.4, the RCS number
would be 2.4.1.1, and there would be two direct offspring from
node 1.4 (1.4.1.1, and 2.4.1.1). Branching the same
node more than once is unusual, and I'm not going to bother 
to illustrate that.

                            ---

OK. So much for preliminaries. The above is slightly simplified
from what we actually did, but that's OK. Now for the port 
to multiple platforms (parallel editing).

To simplify things, lets only show the end of the line of
development for the current development release (i.e., 1.1)
for file x.c .

1.5
r1_1_8_1

Now Christine comes along and starts to port x.c to Solaris.
She checks out a copy of r1_1_8_1 of x.c in her
private working directory. (She also makes copies
of lots of other source files, of course.) She does not
lock any files. 

She makes a copy and doesn't leave x.c locked, 
because it's going to take her quite a while (weeks or months) 
to finish porting what she is porting. Joe, who is adding 
new features to the product, may need to continue the main 
line of development in the interim. He can not be stopped dead 
in his tracks by Christine checking out x.c and leaving it 
locked for weeks or months.

Now Joe makes a change to x.c on the main line of development
on the original platform (AIX) for build 10 of the development
release. Joe is not coordinating with Christine, and Christine
is not coordinating with Joe.

1.5      --> 1.6
r1_1_8_1     r1_1_10_2

This doesn't affect Christine, who has her own copies of
all the files.

Now Sam comes along and starts to port x.c to HPUX. So
Sam checks out a copy of r1_1_10_2 of x.c (and a bunch of
other source files) into his private directory and goes
to town on the port. Just as Christine didn't leave any files
locked, Joe doesn't leave any files locked either. 

Joe, Christine, and Sam are all working in parallel and not
coordinating with each other.

Joe makes another change to x.c for build 12 of the
development release.

1.5      --> 1.6       --> 1.7
r1_1_8_1     r1_1_10_2     r1_1_12_3

Now, Christine finishes her port. So, she checks out x.c (r1_1_12_3)
and leaves it locked. She compares the r1_1_12_3 version against
her private copy of x.c (based on r1_1_8_1). If Joe did anything 
to x.c that interferes with what she did to it, Christine resolves 
the discrepancies by editing her private copy. Once she has decided 
that all discrepancies are resolved, she checks in x.c to build 15 
of the development release using her final copy of x.c . Version
r1_1_12_3 is only locked for the duration of her merge.

1.5      --> 1.6       --> 1.7       --> 1.8
r1_1_8_1     r1_1_10_2     r1_1_12_3     r1_1_15_4

Now x.c can theoretically run on AIX and Solaris, and 
Joe and Sam are working in parallel and not coordinating
with each other.

Now Sam finishes his HPUX port. Sam checks out a copy of
x.c (r1_1_15_4) and leaves it locked. He looks to see that
what he has done against his copy of r1_1_10_2 is still
valid against r1_1_15_4. Sam resolves any discrepancies
in his private copy. Then Sam checks in his private copy
against build 17 of the development release.

1.5      --> 1.6       --> 1.7       --> 1.8       --> 1.9
r1_1_8_1     r1_1_10_2     r1_1_12_3     r1_1_15_4     r1_1_17_5

Now, x.c can theoretically run on AIX, Solaris, and HPUX,
and the binaries for all the platforms can be complied from the 
same source base. The ports are done.

OK. Now several things should be clear:
(0) Simple linear lines of development are critical.
You can never lose track of the lines of development.
(1) User labels are necessary in order to retrieve a
coordinated set of files to reproduce an arbitrary build.
The RCS labels are totally inadequate for this purpose,
since there is no dependable pattern across a large
set of files.
(2) Multiple user labels must be assignable to the same
version of a file in order to support multiple releases
(e.g., multiple field releases and new development).
(3) Parallel editing requires a merge. In general, there
is no general algorithm that can perform this merge
for you. Human insight is required. Tools such as diff
can help, but, in the end, there are lots of situations in 
which a human has to check the results regardless of 
the tools used.
(4) It is not reasonable to expect N versions to be
merged all at the same time. That makes the problem
exponentially more complicated, and humans don't do well
at things that get exponentially more complicated with N.
So, merges should be done pairwise.
(5) One can not keep the main line of development
locked for a very long period of time. 
(6) Yet exclusive locking is necessary for ordinary development, 
and to protect the decisions made during a merge. 
(7) Exclusive locking is necessary and sufficient to do 
parallel editing.
(8) Using the approach of this example, part of the history
of the derivation was lost. (From the final version
graph, you can't tell that Christine worked against
r1_1_8_1 or that Sam worked against r1_1_10_2. You
would need checkin comments to tell you that.)
This may not be desirable. (This issue is addressed in the 
next section.)

                                ---

In the above example, it may be desirable to be more
explicit about the history of how a version was derived.
For example, when Sam checked in his version, the new
version he created was dependent on both the version he
originally checked out and the one that was current
when he finished the port.

Furthermore, it may be desirable for Sam to check in
intermediate versions of his files periodically. These
are regarded as "work in progress" versions, because
they aren't considered finished yet. Yet, it may be
a good idea to check in such work in progress versions
periodically, if only to get them into the safekeeping
of the source code control system. Backups are one
possible consideration. Having copies on multiple disks
is, in general, safer than just having a copy of your 
work on one disk, even if no backups are done.

In order to accomplish these goals, we only need one
additional thing -- the ability to indicate that a
line of development merges into another one.

One way to do this is as follows.

When Christine started her port, she could have forced
an identical version of x.c to be created as the next
version in the main line of development. Then, x.c
will be locked for an extremely brief time. Then,
Christine can check out the next to last version.
When she checks it in, the version graph will branch.
She can keep checking in and out versions on her
very own branch until the port works. Then, she
can do a checkin that (a) terminates her "port to Solaris"
branch, and (b) extends the main line of development branch.
A version label convention will have to be adopted for
her "port to Solaris" branch. Sam can do the same thing
for his "port to HPUX" branch. 

Let's just look at what the final result might
be for Christine's port starting against r1_1_8_1:

1.5      --> 1.6       --> 1.7       --> 1.8       --> 1.9
r1_1_8_1     r1_1_10_2     r1_1_12_3     r1_1_15_4     r1_1_17_5
|            (same as                                  ^
|              r1_1_8_1)                              /
v                                                    /
1.5.1.1  --> 1.5.1.2 --------------------------------
s1_1_0_0     s1_1_1_1   
 
Here Christine forced the creation of r1_1_10_2 to be
exactly the same as r1_1_8_1 by checkout with lock and 
checkin with no changes. Then she checked out r1_1_8_1
with lock for the Solaris release s1_1. Then she checked it 
in, creating s1_1_0_0, a work in progress version. Note
that since she didn't check out the tip, she forced
a branch. Then she checked s1_1_0_0 out with lock and in 
again to create s1_1_1_1, her final version before the merge.

Meanwhile, Joe created r1_1_12_3 from r1_1_10_2, and created
r1_1_15_4 from r1_1_12_3 on the main line of development
to implement new features.

Finally, Christine checked out and locked r1_1_14_4, 
made the necessary adjustments to her private copy of 
s1_1_1_1 based on her private copy of r1_1_14_4, 
and finally checked this private copy in as r1_1_17_5. 
The new thing is that happened on this checkin is that 
an arc from s1_1_1_1 to r1_1_17_5 was created. 
The normal arc from r1_1_15_4 plus the new arc
indicate that r1_1_17_5 was derived from both of those versions.
Thus, we have a complete history of Christine's port and
Joe's new features, and all the derivation relationships.

It's clear how to add SAM to this scenario. Since all
checkout's with lock are exclusive, checkin with merge
is done pairwise -- there are no more than 2 incoming
arcs.


Alan Babich
Received on Friday, 24 July 1998 19:47:55 UTC