Functional Requirements and Framework for Versioning on the WWW
David G. Durand and Fabio Vitali
Changes from last version
- lock durations
- separate locking / resource reservation
- clarify/reorder points
- style-free versioning
Abstract
This document describes the functional requirements for integrating
versioning into the WWW. Versioning is the fundamental basis of document
management systems, with far reaching effects on the semantics of document
identity and meaningful operations. These requirements reflect
the basic versioning needs for document management and collaborative
authoring. It does not define the complete set of requirements for these
domains where they extend beyond the versioning of resources.
1. Introduction
This document discusses why versioning is needed on the WWW, and the
functional requirements for full version support. We have divided the
requirements in three sections. This discussion enumerates the reqirements
for implementing such functionality as a first step to creating a
specification that will address these needs.
We first briefly describe the rationale for versioning on the web in
Section 2. This rationale enumerates the goals of versioning on the WWW.
All specific requirements should support (and certainly should not hinder)
the realization of the goals. Section 3 contains global requirements for
protocol development. These are things we think are technically justified
and that fulfil the rationale. They are separated from the other
requirements because their acceptance creates further constraints on other
technical requirements. Finally, In Section 4, we specific functional
requirements based on the foundation established in the earlier sections.
We have based this effort on David Fiander's suggestion to separate
versioning and configuration requirements, and we assume a two-layer
architecture for versioning on the web. The first layer, whose
requirements are defined in this document, will address the simple problem
of handling multiple versions of single resources. The second
layer will address the thornier problems of configuration management for
multiple resources. This layering simplifies both discussion and design.
2. Rationale
Versioning in the context of the world-wide web offers a variety of
benefits:
- It provides infrastructure for efficient and controlled
management of large evolving web sites.
Modern configuration management systems are built on some form of
repository that can track the revision history of individual resources, and
provide the higher-levelools to manage those saved versions. Basic
versioning capabilities are required to support such systems.
- It allows parallel development and update of single resources
Since versioning systems register change by creating new objects, they
enable simultaneous write access by allowing the creation of variant
versions. Many also provide merge support to ease the revers operation.
- It provides a framework for access control over resources.
While specifics vary, most systems provide some method of
controlling or
tracking access to enable collaborative resource development.
- It allows browsing through past and alternative versions of a
resource
Frequently the modification and authorship history of a resource is
critical information in itself.
- It provides stable names that can support externally stored links for
annotation and link-server support.
Both annotation and link servers frequently need to store stable
references to portions of resources that are not under their direct
control. By providing stable states of resources, version control systems
allow not only stable pointers into those resources, but also well-defined
methods to determine the relationships of those states of a resource.
- It allows explicit semantic representation of single resources with
multiple states
A versioning system directly represents the fact that a resource
has an
explicit history, and a persistent identity across the various states it
has had during the course of that history.
3. Global requirements
This section covers the overarching contraints that must
inform and direct detailed requirements for versioning support. They
encompass compatibility across different implementations, as well as
compatibility with current practice. Therefore, we believe the following
to be the general requirements for WWW versioning:
- Stableness of versions.
Most versioning systems are intended to enable an accurate record of the
history of evolution of a document. This accuracy is ensured by the fact
that a version eventually becomes "frozen" and immutable. Once a version
is frozen, further changes will create new versions rather than modifying
the original. In order for caching and persistent references to be
properly maintained, a client must be able to determine that a version has
been frozen. We require that unlocked resource versions be frozen. This
enables the common practice of keeping unfrozen "working versions". Any
successful attempt to retrieve a frozen version of a resource will always
retrieve exactly the same content, or return an error if that version (or
the resource itself) are no longer available. Since URLs may be
reassigned at a server's discretion this requirement applies only for that
period of time during which a URL identifies the same resource.
- User Agent Interoperability.
All versioning-aware user agents should be able to work with any
versioning-aware HTTP server. It is acceptable for some user agent/server
combinations to provide special features that are not universally
available, but the protocol should be sufficient that a basic level of
functionality will be universal.
- Style-free Versioning
The protocol should not unnecessarily restrict version management style to
any one paradigm. For instance, locking and version number assignment
should be interoperable across servers and clients, even if there are some
differences in their preferred models.
- Separation of access to resources and access control
The protocol must separate the reservation and release of versioned
resources from their access methods. Provided that consistency constraints
are met before, during and after the modification of a versioned resource,
no "right way" to access to a resource is enforced by the protocol. For
instance, a user may request declare an intention to write after a GET, may
POST a resource without releasing the lock, and might even request a lock
via HTTP connection while getting the document via FTP.
- Legacy Resource Support.
The protocol should enable a versioning aware server to work with existing
resources and URLs. Special versioning information should not become a
mandatory part of HTTP protocols except where it is required. Special
version information that would break existing clients and servers, such as new
mandatory headers, cannot therefore be required for GET (and possibly also for
PUT).
- Legacy User Agent Support.
Servers should make versioned resources accessible to versioning-unaware
user-agents in a format acceptable to them.
- Specific named version URLs that are constructed from a URL
and an opaque version string
Because the notation will be required to operate in the version control
environment preferred by the website maintainer, it must be able to
properly contain arbitrary strings, which may be used by the VCS as version
identifiers. While version information may be intelligible to the human
operator, and perhaps to special-purpose clients, the client must be
able to treat the version specifier as a black box.
4. Functional requirements
The following functional reqirements are intended to satisfy the global
requirements of Section 3 and enable the benefits listed in Section 2. The
mention of possible new HTTP methods is intended to make the discussion
clearer and more concrete, not to rule out other methods of meeting the
requirements.
The protocol should provide:
- Access to specific named versions via a URL
This is required for version-specific linking, and for legacy user-agent
support.
- A URL to denote a versioned resource itself, rather than specific
versions of it
This is more important if URL computations are not allowed, since an
identifier is needed for queries about the versioning status of a resource.
This is used to perform operations (such as adjusting attributes, changing
locks, or reassigning URLs) that affect all versions of a resource, rather
than any specific version.
- Direct access to a server-defined "default", "current" or "tip" version
of a resource
This is one of the simplest ways to guarantee legacy user-agent
compatibility and legacy file compatibility. If no special version URLs
are used, the server will provide a default. This does not rule out the
possibility of a server returning an error in case no such default exists.
- A way to access common related URLs from a versioned URL,
whether by
server query, URL computation, or some other way:
- root version(s) of this document
- predecessor version(s) of this document
- successor version(s) of this document
- default version of this document
Some versions of a resource are special. It must be possible in some way
for a versioning-aware client to access common related versions to the one
it currently is displaying. Possible solutions include, but are not
limited to: the server automatically adding header fields to a versioned
URL specifying the URL of the common related versions, the server providing
one or more query methods ("who is the previous version to this URL?"), or
a standardized way to compute related URLs when given a versioned URL. We
feel that access to the "default" version of a resource is an extremely
important operation, that a browser should be able to perform at any time
that a versioned URL is seen.
- A way to retrieve the complete version topology for this resource
There should be a way to retrieve information about all versions of a
resource. The format for this information must be standardized so that the
basic information can be used by all clients.
- Some way to determine that a
URL points to a named version of a resource
This might be implemented as part of the URL format, a server query or
additional headers.
- Some way to determine a version identification and a resource
identification for a versioned resource, given its URL
This requirement describe the ability to take the URL of a version of a
resource and determine:
- a URL for the resource
- a version identifier for the resource.
Note that this kind of facility supports only some comparison operations: It
enables the determination that two version-containing URLs designate
versions of the same resource. However, given the phenomenon of URL
aliasing, it is insufficient to determine that they are not
versions of the same resource.
This is sort of a minimal "browsing
through time" requirement. Tthis requirement allows a browser to tell that a
versioned resource has been accessed and then to invoke special versioning
or configuration management operations on the resource. While client
performance will be best if this can be done via URL computation (ie.
mangling) it could also be done by an extra query and round-trip to
the server.
- A way to request exclusive access to a version of a resource
(LOCK)
Since not all systems implement lock-based access there is a
question as how this should be implemented. Client use of this
method could be optional, allowing some relatively strong guarantee on the
meaning of acquiring a lock. Alternatively, clients could be expected to
take a lock, but servers might implement different locking policies
(possible even including implementation of LOCK and UNLOCK as NOPS).
- A way to specify a timeout after which a lock will lapse
In many cases, locks over a certain duration are due to errors, and their
strict enforcement can cause more problems than inadvertent version skew.
We should allow locks to have a lifetime. It may prove a good idea to
have a finite default lifetime defined by the protocol. If a universal
default is too constraining, there should be a way for a server to inform
the client what the lifetime of a lock is. Servers should honor client
lock lifetime requests, or inform them if the request is denied.
- A way to release exclusive acccess to a resource (UNLOCK)
This is the inverse of LOCK.
- A way for a client to declare an intention to modify a resource
(RESERVE or CHECKOUT?)
This operation is required before any versioned update. Its effects may
vary depending on server policy, from locking a resource, to forking a new
variant, to a NOP on servers that do not track sessions or restrict
updates. If this operation returns a version number, the client is
required to make sure that it uses a copy of the data associated with that
version number of the resource for any update operations it carries out.
Servers that wish to enforce a mandatory GET operation before update, should
simply use a fresh version identifier on the return from this operation.
- A way to declare the end of an intention to write a resource
This is the inverse of RESERVE. Typically, servers will commit
updates at this time, and return a final version identifier if possible
and if it was not already returned.
- A way to submit a new version of a resource (PUT)
The server should be able to attach it to the correct part of the version
tree, based on the version number associated with the resource before its
modification.
- A way for a user-agent to request a version identifier for a checked
out version.
Such an identifier will not be used by any other
user-agent in the meantime. The server may refuse the request.
- A way for a client to propose a version identifier upon
submitting a version of a resource
The server may refuse to to use the client's suggested version identifier.
- A way for a Client to supply metatdata to be associated with a
version
The kinds of data supplied here might be simple textual comments or more
structured data. An ability to attach aritrary fields and content is
probably required, but a standard set of attributes that would enable
interoperation would be useful. For basic versioning we need only specify,
for example, that comments are attached as the message-body of the operation
that releases a write intention. The special formats for structured metadata
can then be handled by using content-type negotiation, and the
content-types defined as part of the Configuration Management layer.
- A way for a server to provide a version identifier to be used for a
resource in further operations/
This general requirement notes that version aware clients are responsible
for providing the appropriate version identifier for a resource that is
being manipulated. In particular, if a resource is being modified, any
server provided version must be used when submitting an update. This
allows servers to track active sessions (however they may be implemented by
the server) by assigning version identifiers when documents are retrieved,
locked, or reserved.
The following discussion of possible implementations of the requirements
above is intended to aid understanding of the requirements. It is
not a statement that a particular implementation is a requirement
for basic versioning, but an explanation of how the separation of concerns
might improve the final implementation architecture.
The requirements on reservation and PUT take care some key global
requirements: version access is logically separated from access control
(RESERVE/RELEASE) and updating. In terms of traditional CM, a CHECKOUT is a
RESERVE followed by a GET and a CHECKIN is a PUT followed by an RELEASE. By
separating access control (locking and unlocking of resources) from
modification of resources, we achieve a great deal of versioning-style
independence.
We also have very flexible options for the negotiation of
version identifiers depending on server policy. The version identifier of a
new resource can be negotiated between the user-agent and the server at 3
points in time: when a lock is taken, when the lock is released, or when
the resource is POSTed. Session tracking can be implemented by using
special version identifiers for RESERVE and RELEASE. All version
identifier negotiation follows a simple rule: "the client proposes, but the
server disposes."
Acknowledgements
This document is a result of the vigorous and valuable discussion on the
Versioning on the Web
<www-vers-wg-request@ics.uci.edu>, and the Distributed Authoring
<w3c-dist-auth-request@w3.org&
gt;
mailing lists. All the the interactions on these lists have been helpful,
as have several conversations.
David Fiander's initial requirements got us started and clarified several
points . Jim Whitehead provided useful criticism, some new points, and
impetus to get this thing out the door. Yaron Golan and Christopher
Seiwald provided extensive commentary and discussion.
The following list include the above and others who have also helped
either with their postings, personal email or face-to-face discussions:
Dan Connolly, World Wide Web Consortium, connolly@w3.org
Ron Fein, Microsoft, ronfe@microsoft.com
David Fiander, Mortice Kern Systems, davidf@mks.com
Roy Fielding, U.C. Irvine, fielding@ics.uci.edu
Yaron Goland, Microsoft, yarong@microsoft.com
Dave Long, America Online, dave@sb.aol.com
Henrik Frystyk Nielsen, World Wide Web Consortium, frystyk@w3.org
Larry Masinter, Xerox PARC, masinter@parc.xerox.com
Murray Maloney, SoftQuad, murray@sq.com
Christopher Seiwald, Perforce Software, seiwald@perforce.com
Judith Slein, Xerox, slein@wrc.xeroc.com
To Do
Mandatory IETF formatting. Proofread. Spell check. Sanity check.