- From: Babich, Alan <ABabich@filenet.com>
- Date: Fri, 24 Jul 1998 16:44:47 -0700
- To: "'ejw@ics.uci.edu'" <ejw@ics.uci.edu>, John Stracke <francis@netscape.com>, Chris Kaler <ckaler@microsoft.com>, Bradley Sergeant <bradley_sergeant@intersolv.com>, Alan Babich <ABabich@felix.filenet.com>, Sam Ruby <rubys@us.ibm.com>, Bruce Cragun <Cragun.Bruce@gw.novell.com>, David Durand <dgd@cs.bu.edu>, Sridhar Iyengar <sridhar.iyengar@mv.unisys.com>
- Cc: Alex Hopmann <alexhop@microsoft.com>, "'webdav'" <w3c-dist-auth@w3.org>
> a) Prepare at least one (and ideally many more than one)
> scenario. Please
> email it out to the rest of the design team before the meeting (by the
> 6th) -- you should send it to the general WebDAV mailing list as well.
OK, Jim, as per your request, here's a scenario from the real world.
WARNING: This is a long e-mail (about 345 lines).
A software company is porting several hundred thousands
of lines of C code to several different platforms. The
company intends to have exactly one source base that
handles all platforms. Therefore, there will be #ifdef's
in some of the source files that control selection of
text for platform dependent stuff like I/O.
There are two field releases being supported as
linear lines of development, and the current development
release is being supported as a linear line of development.
The field release lines of development branch off
the development release line of development when a change
is made to the development release but not a field release.
First, let's consider how we start out. Then we will
consider what happens during the port, i.e., parallel editing.
Consider one source file x.c . It started out:
1.1 --> 1.2 --> 1.3
Those are the good old RCS version labels. (First number
is number of times the same node was branched. Second
number is consecutive linear line of development change
number.) We attach user version labels, because we don't
care about no stinking RCS version labels. :-) They are
irrelevant to us humans. So, we invent a convention
where our user version labels are of the form
r<major release number>_<minor release number>_<build number>_
<change number for the major/minor release>
User version labels are a necessity in order to be
able to recreate or initially create any base level of
any release. In order to do so, you have to select a
whole collection of files, and the exact versions
you need must be specified in a simple way, e.g.,
release number and build level.
(For example, in order to do that for build 7 of release 1.0,
you merely check out a read only copy of the version of every
file that has the version label of the form r1_0_7_x where x is
maximal. There is no possibility of such a simple algorithm
against the hardwired RCS version labels.)
The layer on top of sccs puts the user version labels on
automatically. So, the version structure for x.c is actually
1.1 --> 1.2 --> 1.3
r1_0_0_0 r1_0_1_1 r1_0_2_2
1.1 was the initial version of x.c for build 0 of release 1.0.
1.2 was change 1 for build 1 of release 1.0.
1.3 was change 2 for build 2 of release 1.0.
OK. So now we release 1.0 to the field and start work on
release 1.1. Nothing happens until we change x.c .
There are two possibilities. Either we are making a pure
new development change, or are fixing a bug in the field,
and we want that exact same fix in the development release.
First, we fix a bug and the fix is exactly the same (and
x.c is exactly the same) in the development release
(release 1.1) for build 3 of the field release, and build 0
of the development release. The change is automatically
"rolled forward" by the tools by simply putting on multiple
version labels on the new file. This can optionally be
done only when checking out and in the tip of a line of
development.
1.1 --> 1.2 --> 1.3 --> 1.4
r1_0_0_0 r1_0_1_1 r1_0_2_2 r1_0_3_3
r1_1_0_0
Next, we put in a piece of pure new development for build 8
of the development release (i.e., 1.1).
1.1 --> 1.2 --> 1.3 --> 1.4 --> 1.4
r1_0_0_0 r1_0_1_1 r1_0_2_2 r1_0_3_3 r1_1_8_0
r1_1_0_0
Next, we fix a bug in the field release (i.e., 1.0) build 7.
The change can not "roll forward" to the development release,
because we have made a development only change. In other words,
we are now checking out (and back in) a node that is not at
the tip of the line of development. This causes the
version tree to branch.
1.1 --> 1.2 --> 1.3 --> 1.4 --> 1.5
r1_0_0_0 r1_0_1_1 r1_0_2_2 r1_0_2_3 r1_1_8_1
r1_1_0_0
|
v
1.4.1.1
r1_0_7_4
Then we fix another bug for build 9 of the field release (i.e., 1.0).
1.1 --> 1.2 --> 1.3 --> 1.4 --> 1.5
r1_0_0_0 r1_0_1_1 r1_0_2_2 r1_0_2_3 r1_1_8_1
r1_1_0_0
|
v
1.4.1.1 --> 1.4.1.2
r1_0_7_4 r1_0_9_5
It should be clear how the lines of development progress from here.
Note that the development release is always the main trunk
(i.e., has RCS version numbers of the form "1.x".)
If we ever check out and in node 1.4, the RCS number
would be 2.4.1.1, and there would be two direct offspring from
node 1.4 (1.4.1.1, and 2.4.1.1). Branching the same
node more than once is unusual, and I'm not going to bother
to illustrate that.
---
OK. So much for preliminaries. The above is slightly simplified
from what we actually did, but that's OK. Now for the port
to multiple platforms (parallel editing).
To simplify things, lets only show the end of the line of
development for the current development release (i.e., 1.1)
for file x.c .
1.5
r1_1_8_1
Now Christine comes along and starts to port x.c to Solaris.
She checks out a copy of r1_1_8_1 of x.c in her
private working directory. (She also makes copies
of lots of other source files, of course.) She does not
lock any files.
She makes a copy and doesn't leave x.c locked,
because it's going to take her quite a while (weeks or months)
to finish porting what she is porting. Joe, who is adding
new features to the product, may need to continue the main
line of development in the interim. He can not be stopped dead
in his tracks by Christine checking out x.c and leaving it
locked for weeks or months.
Now Joe makes a change to x.c on the main line of development
on the original platform (AIX) for build 10 of the development
release. Joe is not coordinating with Christine, and Christine
is not coordinating with Joe.
1.5 --> 1.6
r1_1_8_1 r1_1_10_2
This doesn't affect Christine, who has her own copies of
all the files.
Now Sam comes along and starts to port x.c to HPUX. So
Sam checks out a copy of r1_1_10_2 of x.c (and a bunch of
other source files) into his private directory and goes
to town on the port. Just as Christine didn't leave any files
locked, Joe doesn't leave any files locked either.
Joe, Christine, and Sam are all working in parallel and not
coordinating with each other.
Joe makes another change to x.c for build 12 of the
development release.
1.5 --> 1.6 --> 1.7
r1_1_8_1 r1_1_10_2 r1_1_12_3
Now, Christine finishes her port. So, she checks out x.c (r1_1_12_3)
and leaves it locked. She compares the r1_1_12_3 version against
her private copy of x.c (based on r1_1_8_1). If Joe did anything
to x.c that interferes with what she did to it, Christine resolves
the discrepancies by editing her private copy. Once she has decided
that all discrepancies are resolved, she checks in x.c to build 15
of the development release using her final copy of x.c . Version
r1_1_12_3 is only locked for the duration of her merge.
1.5 --> 1.6 --> 1.7 --> 1.8
r1_1_8_1 r1_1_10_2 r1_1_12_3 r1_1_15_4
Now x.c can theoretically run on AIX and Solaris, and
Joe and Sam are working in parallel and not coordinating
with each other.
Now Sam finishes his HPUX port. Sam checks out a copy of
x.c (r1_1_15_4) and leaves it locked. He looks to see that
what he has done against his copy of r1_1_10_2 is still
valid against r1_1_15_4. Sam resolves any discrepancies
in his private copy. Then Sam checks in his private copy
against build 17 of the development release.
1.5 --> 1.6 --> 1.7 --> 1.8 --> 1.9
r1_1_8_1 r1_1_10_2 r1_1_12_3 r1_1_15_4 r1_1_17_5
Now, x.c can theoretically run on AIX, Solaris, and HPUX,
and the binaries for all the platforms can be complied from the
same source base. The ports are done.
OK. Now several things should be clear:
(0) Simple linear lines of development are critical.
You can never lose track of the lines of development.
(1) User labels are necessary in order to retrieve a
coordinated set of files to reproduce an arbitrary build.
The RCS labels are totally inadequate for this purpose,
since there is no dependable pattern across a large
set of files.
(2) Multiple user labels must be assignable to the same
version of a file in order to support multiple releases
(e.g., multiple field releases and new development).
(3) Parallel editing requires a merge. In general, there
is no general algorithm that can perform this merge
for you. Human insight is required. Tools such as diff
can help, but, in the end, there are lots of situations in
which a human has to check the results regardless of
the tools used.
(4) It is not reasonable to expect N versions to be
merged all at the same time. That makes the problem
exponentially more complicated, and humans don't do well
at things that get exponentially more complicated with N.
So, merges should be done pairwise.
(5) One can not keep the main line of development
locked for a very long period of time.
(6) Yet exclusive locking is necessary for ordinary development,
and to protect the decisions made during a merge.
(7) Exclusive locking is necessary and sufficient to do
parallel editing.
(8) Using the approach of this example, part of the history
of the derivation was lost. (From the final version
graph, you can't tell that Christine worked against
r1_1_8_1 or that Sam worked against r1_1_10_2. You
would need checkin comments to tell you that.)
This may not be desirable. (This issue is addressed in the
next section.)
---
In the above example, it may be desirable to be more
explicit about the history of how a version was derived.
For example, when Sam checked in his version, the new
version he created was dependent on both the version he
originally checked out and the one that was current
when he finished the port.
Furthermore, it may be desirable for Sam to check in
intermediate versions of his files periodically. These
are regarded as "work in progress" versions, because
they aren't considered finished yet. Yet, it may be
a good idea to check in such work in progress versions
periodically, if only to get them into the safekeeping
of the source code control system. Backups are one
possible consideration. Having copies on multiple disks
is, in general, safer than just having a copy of your
work on one disk, even if no backups are done.
In order to accomplish these goals, we only need one
additional thing -- the ability to indicate that a
line of development merges into another one.
One way to do this is as follows.
When Christine started her port, she could have forced
an identical version of x.c to be created as the next
version in the main line of development. Then, x.c
will be locked for an extremely brief time. Then,
Christine can check out the next to last version.
When she checks it in, the version graph will branch.
She can keep checking in and out versions on her
very own branch until the port works. Then, she
can do a checkin that (a) terminates her "port to Solaris"
branch, and (b) extends the main line of development branch.
A version label convention will have to be adopted for
her "port to Solaris" branch. Sam can do the same thing
for his "port to HPUX" branch.
Let's just look at what the final result might
be for Christine's port starting against r1_1_8_1:
1.5 --> 1.6 --> 1.7 --> 1.8 --> 1.9
r1_1_8_1 r1_1_10_2 r1_1_12_3 r1_1_15_4 r1_1_17_5
| (same as ^
| r1_1_8_1) /
v /
1.5.1.1 --> 1.5.1.2 --------------------------------
s1_1_0_0 s1_1_1_1
Here Christine forced the creation of r1_1_10_2 to be
exactly the same as r1_1_8_1 by checkout with lock and
checkin with no changes. Then she checked out r1_1_8_1
with lock for the Solaris release s1_1. Then she checked it
in, creating s1_1_0_0, a work in progress version. Note
that since she didn't check out the tip, she forced
a branch. Then she checked s1_1_0_0 out with lock and in
again to create s1_1_1_1, her final version before the merge.
Meanwhile, Joe created r1_1_12_3 from r1_1_10_2, and created
r1_1_15_4 from r1_1_12_3 on the main line of development
to implement new features.
Finally, Christine checked out and locked r1_1_14_4,
made the necessary adjustments to her private copy of
s1_1_1_1 based on her private copy of r1_1_14_4,
and finally checked this private copy in as r1_1_17_5.
The new thing is that happened on this checkin is that
an arc from s1_1_1_1 to r1_1_17_5 was created.
The normal arc from r1_1_15_4 plus the new arc
indicate that r1_1_17_5 was derived from both of those versions.
Thus, we have a complete history of Christine's port and
Joe's new features, and all the derivation relationships.
It's clear how to add SAM to this scenario. Since all
checkout's with lock are exclusive, checkin with merge
is done pairwise -- there are no more than 2 incoming
arcs.
Alan Babich
Received on Friday, 24 July 1998 19:47:55 UTC