WebDAV Versioning Overview from Model Document

jamsden@us.ibm.com
Mon, 24 May 1999 08:53:53 -0400


From: jamsden@us.ibm.com
To: ietf-dav-versioning@w3.org
Message-ID: <8525677B.004A5F13.00@d54mta03.raleigh.ibm.com>
Date: Mon, 24 May 1999 08:53:53 -0400
Subject: WebDAV Versioning Overview from Model Document




The following is an excerpt from the WebDAV Versioning Model document that
provides an updated overview of the versioning semantics. This is more detailed
than the versioning summary I sent out last week, and contains some updates for
the latest thinking on configurations and baselines. Please give this a thorough
read and review before next week's design team meeting as this forms the basis
of what the protocol should support. If there are any errors, omissions, or
issues, we can resolve them as we review the protocol. I'm hoping we can use
this overview as the introductory section of the protocol document after perhaps
factoring it out so it can be used in both the model and protocol documents. I
am assuming the definitions in the goals document. Perhaps those should be
factored out and reused too.





WebDAV Versioning Semantics Overview


This section provides an overview of the WebDAV versioning semantics. Subsequent
sections provide detailed methods and semantic rules.


Creating Versioned Resources


A resource is any potentially statefull entity that can be accessed on the web
through a URL. The model below defines an interface, Resource that abstracts a
web resource and its behavior. It also defines a specialization of Resource
called ResourceCollection that adds containment or grouping behavior for
Resources and other ResourceCollections using the Composite pattern.


A resource may or may not be versioned. When a resource is first created by
using a WebDAV PUT or MKCOL method, or Resource.setContents(), it is created as
an unversioned resource. A resource may be checked in to make it a versioned
resource, and to create the initial or first revision. A checked in revision
cannot be modified by anyone at any time without checking it out first.


When a resource is put under version control, the server generates a URI for the
versioned resource as a whole, and for each revision that may be used to
explicitly access that revision. These URIs are globally unique and are never
reused for any other versioned resource or revision.


Naming Revisions: Revision Ids and Labels


Each revision of a versioned resource must be distinguished from other revisions
of the same versioned resource. Revision names are used to distinguish revisions
and consist of either a revision id or any number of revision labels. A revision
of a versioned resource is given a system assigned revision id when it is
checked in. This revision id acts as a persistent, immutable identifier
distinguishing this revision from all others of the same versioned resource. The
revision id cannot be changed, assigned to another revision, or reused.


A user may assign other revision names called revision labels to a revision in
order to distinguish it from other revisions using more meaningful names. The
revision labels must be unique for any given versioned resource, but may be
reassigned to any revision of the versioned resource at any time. Revisions of
different versioned resources may have the same label.


Modifying a Versioned Resource


Subsequently, a client may reserve or check out a revision, which creates a
working resource that is a copy of the checked out revision. Checking out a
revision registers intent to modify the revision and prevents other users from
modifying the same revision at the same time producing conflicting or lost
updates. By adhering to a checkout, update, checking protocol, users are assured
their updates will not be lost or conflict with those of other users. A working
resource is identical to an unversioned resource in all respects other than that
it has one or more predecessors. It may be edited by setting its properties or
contents any number of times. When the client is satisfied that the working
resource is in a state that should be retained in the version history, the
client checks the working resource in to create a new revision of the resource.


Users can use checkout/checkin to register intent to modify a versioned resource
similar to the way lock and unlock are used in DAV level 2. The sense is
reversed though. A checked in revision cannot be changed without checking it out
first, and revision histories are maintained.


The working resource may be checked in as either mutable or immutable. An
immutable revision cannot be changed and provides a stable environment for
history management, change recovery, merging, and configuration management. A
mutable revision is more suitable for situations where versioning is treated
more informally, and it is not necessary or desirable to maintain strict version
histories, or to be guaranteed that it is always possible to backtrack to a
previous point in time and recover. This form of versioning is typical of modern
document management systems. If the revision is mutable, a subsequent checkin
may be done with overwrite allowing the revision to be updated without creating
a new revision. Any previous contents of the revision are lost. A mutable
revision can also be checked in creating a new revision if the user wants to
retain the previous revision. If the revision is immutable, a check in must
create a new revision, checkin with overwrite is not allowed.


Servers may choose to not allow revisions to be checked in as mutable, or they
may not allow a revision to be checked in without creating a new revision. These
constraints are typical of current configuration management systems. Document
management systems typically allow revisions to be mutable and don't have these
restrictions.


Adding or removing member URL segments modifies a revision of a collection.
Changing the contents of a member of a revision of a versioned collection does
not imply a change to that revision of the versioned collection.


The ability to checkout a revision may be controlled. A user may checkout a
revision specifying a revision scope of shared or exclusive. Shared scope
implies that other users may checkout the same revision in some other activity
while exclusive scope prevents any parallel development on this revision.
Checkout control is managed through locks on the versioned resource and/or the
revision as describe in "Controlling Versioned Resources".


Selecting Revision through the Workspace


Resources, working resources, versioned resources, and revisions of versioned
resources are all accessed using a URL. When a user agent accesses a revision of
a versioned resource, it is necessary to provide additional information to
specify which revision of the versioned resource should be accessed. Specifying
the resource URL and a revision name can be used to access specific revisions of
a versioned resource. However, this requires the user to add and remember labels
for each revision, and does not provide a way of accessing revisions modified in
an activity, or contained in a configuration. Nor does it enable non-versioning
aware clients to access revisions. There must also be some way to distinguish
working resources checked out from the same revision by different principals.


Revisions are usually accessed using a simple, human meaningful URL for a
versioned resource. A workspace may be used to provide a mapping between
versioned resources and specific revisions of versioned resources. Setting the
revision selection rule of a workspace specifies the mapping. This allows
versioned resources and un-versioned resources to be accessed the same way. As a
consequence, relative ULRs continue to work, and DAV class 1 or 2 clients that
are not versioning aware are able to access versioned resources through a
default workspace. The server maps the user URL to a versioned resource in a
server-dependent way while the workspace selects a particular revision of the
versioned resource.


A workspace may contain a current activity and a revision selection rule. See
the section "Parallel Development with Activities" for further details on
activities. When a workspace revision selection rule is used to perform revision
selection for a versioned resource: If the URL is to a checked out working
resource, then it is selected. Working resource can only be accessed through a
workspace. If the URL is to a versioned resource that is not checked out, the
workspace revision selection rule is applied to select the revision. If there is
no matching revision, then a resource not found status is returned. This rule is
applied to collections to select the revision that determines their member
names, and to other resources to determine the revision containing their
contents.


A workspace revision selection rule can specify any number of revision labels,
activities, configurations, or the revision selector "latest" to specify what
revision to select. The rules are applied in order until the first match is
found. Any subsequent potential matches are ignored. A label matches a revision
with that label. An activity matches the latest revision in that activity, and
may result in merge conflicts with changes made in other activities. A
configuration matches a revision contained in that configuration. Latest matches
the latest revision based on the last modified time. See section
"Configurations" for further details on configurations. See section ?Merging?
for additional rules on revision selection when the revision selector is merged
into the workspace revision selection rule.


If a request is made and no workspace is specified, a default workspace
containing no activity and "latest" in the revision selection rule is used.
Administrators can change the current activity and revision selection rule of
this default workspace to have such down-level client requests done in an
activity, or to support access to more specific revisions of versioned
resources.


A resource revision is checked out in the context of a workspace, which is used,
with the resource URL, to subsequently access the working resource. Different
users can checkout the same revision in different workspaces and not see each
other?s changes. If a workspace is not specified on checkout, the server selects
a workspace and returns it in the checkout response. This workspace has no
current activity, and no revision selection rule. It can only be used to access
the checked out working resource. Client applications that wish to do their own
URL to revision mappings and not rely on server workspaces may use these
workspaces as checkout tokens to do so.


Revisions are checked out in the current activity of the workspace if any. When
the resource is checked back in, it remains visible in the workspace if the
workspace revision selection rule contains the current activity. In order to
prevent checked in revisions from becoming invisible in the workspace if
activities aren't used, a workspace might have a current label. This label is
automatically applied to any revision when it is checked in. If the versioned
resource already has a revision with this label, the label is moved to the new
revision. Putting this label in the workspace revision selection rule will
ensure that all checked in revisions are visible in the workspace.


Parallel Development with Activities


When a revision of a versioned resource is already checked out, another user
cannot check it out again and therefore cannot make any changes. In order to
increase resource availability, avoid serializing work, and allow multiple users
to make changes to the same revision simultaneously, a server may support
parallel development. Parallel development allows users to choose to do work on
a resource that is checked out by someone else in a different context, and to
merge those changes together at some later time.


Resources are checked out in the context of an activity. An activity abstracts a
set of related changes a user is making to versioned resources. Each activity
represents a thread of development. Servers may support multiple activities that
can be used to enable parallel development. These different activities can be
merged together at some later time in order to integrate the changes. A revision
that is already checked out in an activity cannot be checked out again in the
same activity. If parallel development is desired, a user can checkout the
revision in another activity and merge them later. See the section "Merging" for
further details.


Activities can be seen as adding complexity to both clients and servers that may
not be desirable for some situations. Simple parallel development does not
require users to create activities, and set a current activity in their
workspace. Clients can choose to manage their own parallel development and merge
manually. The user just wants to checkout a revision, make some changes and
check it back in. There is no need to organize changes, or provide sophisticated
merging. If there is a conflict, users will simply resolve it at checkin
manually, or not bother with the merge at all.


Simple parallel development can be accomplished without using activities. A
server may allow many checkouts of the same revision without using an activity.

The workspace merge conflict report is not available to detect conflicts
resulting from changes that were not made in the context of an activity. Client
applications are responsible for detecting and integrating the changes. In order
to prevent checked in revisions from becoming invisible when activities are not
used, the workspace supports a current label. The current label is automatically
moved to any new revision that is checked-in to that workspace.  Two workspaces
with different current labels can work in parallel on the same
versioned-resource, and then simplified merging can be performed by adding both
labels to the revision selection rule of the workspace incorporating the changes
done in other workspaces.


Configuration Management

A workspace represents a volatile set of revisions. Any new checkouts in that
workspace, changes to versioned resources that affect the revision selected by
the revision selection rule, or changes to the revision selection rule itself,
may result in the selection of different revisions or working resources for
versioned resources. A configuration is a versionable resource that represents a
consistent, immutable set of revisions. A configuration contains a set of
revisions, where a given versioned resource can have at most one revision in a
given configuration. A configuration cannot contain a mutable revision because
the semantics of configurations cannot be guaranteed. Different revisions of a
configuration can select different revisions of the same versioned resources, or
can select revisions of different versioned resources. A configuration may be
used as a revision selector in a workspace revision selection rule. A workspace
whose version selection rule contains a configuration will always return the
same revisions as long as there are no revisions checked out.

A revision may be added to a configuration by a specific label, or is the
revision may be selected by a given workspace. When a revision of a versioned
collection is added to a configuration, it, and recursively all its members are
included in the configuration. That is, a revision for the collection, and
recursively revisions of all its members are selected. This enables
configurations to maintain the state of namespaces defined by versioned
collections as well as the state defined by the contents and properties of
resources. Adding a revision to a configuration that already contains that
member replaces the selected revision.

The URL used to access a revision of a versioned resource in the context of a
label or workspace when the revision is added to a configuration is not retained
in the configuration. In order to access this revision at some later time, it is
necessary to add the configuration to the revision selection rule of a
workspace, and bind a name in the server's namespace to the versioned resource
corresponding to the desired revision. Then the server uses the URL binding to
access a versioned resource, and the workspace to select a particular revision
as specified by the configuration in the revision selection rule. This allows
flexibility in naming revisions in the context of how they are used. If the user
URL of the revision is important, then it is possible to retain this information
by putting a revision of the revision's parent collection in the configuration.

Configurations can depend on other configurations. The meaning of this
dependency is that when a configuration is used as a revision selector in a
workspace revision selection rule, its dependent configurations are also
implicitly included. Dependent configurations cannot have overlapping members.

A versioned collection has an associated baseline which is a distinguished,
versioned configuration containing the collection, and recursively, all its
members. A new revision of a versioned collection baseline is created by
baselining the collection. If a collection represents a component and its parts,
a baseline of a collection represents a particular configuration of that
component. Baselines provide a convenient means of accessing versions of a
configuration of a versioned collection and facilitate reuse by helping users
discover which configuration to use.

Configurations are convenient for defining a persistent set of revisions that
relate to each other in some specific way at some point in time. This can be
useful for selecting consistent versions of resources to publish or deploy an
application, or for recovering to a specific version state for legal or
maintenance reasons.


Versioned Collections


A collection contains a set of member URL segments. For versioned collections,
the members represent versioned resources, not particular revisions. To add or
remove members from a revision of a versioned collection, it must be checked out
just like any other resource. Creating a new revision of a member, or modifying
a member has no effect on the collection. Deleting a versioned resource that is
a member of a collection does not delete the versioned resource; it only deletes
the member from that version of the collection. The resource may still be a
member of a previous or subsequent revision of the collection or some other
collection. The URL for a collection without a particular revision name is
resolved to a particular revision using the workspace the same as any other
resource. If the collection is part of a URL for some other resource, then its
members are determined from the selected revision.


When a revision of a collection is added to a configuration, then recursively,
so are all its members. This is similar to COPY and MOVE, which must specify
infinite depth. As described in the section "Configurations", a versioned
collection may have a baseline which is a versioned configuration selecting a
revision of the versioned collection, and recursively revisions of all its
members.


Revision History


A revision may have one predecessor, zero or more merge predecessors, and more
than one successor. A predecessor of a revision is a revision that this revision
was derived from. A merge predecessor is a predecessor created by merging
changes from a source predecessor resource into a target successor resource. A
successor of a revision is a revision derived from this revision. Each revision
has a line-of-descent that consists of a path from the initial revision of the
resource to the selected revision along the successor/predecessor relationships.
A line-of-descent specifies a portion of the overall history of the versioned
resource.


Each revision has a predecessor relationship with the revision it was checked
out from, a merge predecessor relationship with the revisions merged into it,
and a successor relationship with revisions that were checked out from it.
Revisions are related to their predecessor and merge predecessors through
is-derived-from or merged-from relationships. The revision history of a
versioned resource includes these relationships along with revision ids and
labels, revision descriptions, checked out state, etc. The revision history
contains sufficient information so that a client may display or sort the history
by last modified properties.


Merging

Each activity represents a separate parallel thread of development. Users may
make their changes in the context of an activity. Changes to the same revision
must be done in separate activities or using no activity. At some point, a user
may want to merge changes made to the same revision together to create a new
revision containing the combined updates. This is accomplished by merging an
activity into a workspace. Revision selectors in the workspace revision
selection rule are usually connected with an "or" relationship. That is, if a
revision selector in the list denotes a matching revision , then that revision
is selected and all the other revision selectors are ignored. When a merge is
desired, the revision selector is added to the revision selection rule with a
"merge" relationship. In this case, when a revision selector denotes a matching
revision, all other revision selectors are examined to determine if they would
select a conflicting revision. A merge conflict is determined by the following
rules. In these rules, the merge source is the revision selected by the revision
selector being merged into the workspace revision selection rule.  The alternate
revision is any other revision selected by some other revision selector in the
workspace revision selection rule.

1.   If the revision selected by the merge source specifies a predecessor of an
alternate revision, then the alternate revision is selected.
2.   If the merge source specifies a successor of an alternate revision, then
the merge source revision is selected.
3.   Otherwise the merge source and the alternate revision are revisions that
are on different lines-of-descent, and a merge conflict exists. This merge
conflict will be indicated when a conflicting revision is accessed through the
workspace.

In order to do a merge, it is first necessary to determine what must be merged.
A user determines the conflicts by merging the source activity (or any other
revision selector) with the workspace. This enters the merge source revision
selector into the workspace revision selection rule with a "merge" relationship,
and introduces merge conflicts that must be resolved. A merge conflict report
lists the revisions that have been modified in parallel in different activities.
The merge conflict report is generated by examining all resources selected by
revision selectors merged into the workspace revision selection rule, and
determining if those revisions conflict with any other revision selected by the
workspace revision selection rule.

A user can request the differences between two revisions of a resource (servers
may provide a differences report, but they must at least indicate if they are
the same or not). A user can request conflicts between an activity and the
current workspace to generate a merge conflict report. A user can also request
the differences between a configuration and the current workspace, which lists
at least the activities that are contained in the configuration but not in the
workspace and vice versa. So differences are detected at different levels:
content differences for resources, revision differences for activities, and
activity differences for configurations.

Once the merge conflicts are known, the conflicts are resolved by merging the
revisions from the merge source into the revision selected by the workspace to
create a new working resource. Servers may perform some default auto merging,
but at a minimum, the merge is done by checking out the revision in the current
activity and noting the merge from the merge source. This creates a merge
successor/predecessor relationship between the merge source and workspace
revisions called merged-from. The conflict is now removed because the working
resource is now a successor of both the source and target revisions. It is the
user's responsibility to apply the differences in the two revisions in an
appropriate manner. The merge is complete when all the conflicts are resolved,
all differences have been merged, and the resources are all checked back in.

When merging mutable revisions, the merge conflict report may be inaccurate as
the source revision may change without the system being aware. Users are
responsible for applying any changes to ancestor revisions to their descendants
as appropriate. The system cannot determine if there are any changes that need
to be applied other than by looking a the last-modified dates of the revisions.

In summary, merging activities is simply adding the activities to the revision
selection rule of a workspace. The workspace can then produce the potential
revision conflicts by detecting activities or revision selectors that specify
revisions on different lines of descent of the same versioned resource. These
conflicts are available in the merge conflict report. Conflicts are resolved by
merging the revisions creating new working resources where the client suitable
applies changes from conflicting revisions. The merge is complete when the merge
conflict report is empty.


Locking Versioned Resources


Locking a versioned resource prevents any principal other than the owner of the
lock from checking out any revision in any activity. Locking a revision of a
versioned resource prevents any principal other than the owner of the lock from
checking out just that revision in any activity. Shared locks allow multiple
principals to control checkouts on the versioned resource or revision.


Locking an activity prevents any principal from making any further changes in
the context of that activity. That is, it is not possible to checkout a resource
using a locked activity.


Locking a workspace prevents any principal from making any change to that
workspace including changing the revision selection rule, or checking out any
resources in that workspace.