Changes from last version
- removed lock durations
- added more references
- changed "style-free versioning" to "policy independent
versioning"
- cleaned up prose.
To Do
Spell check.
HTTP Working Group David G. Durand,
INTERNET-DRAFT Boston University
<draft-durand-www-versreq-00.txt> November 6, 1996
Expires March, 1997
Functional Requirements and Framework for Versioning on the WWW
David G. Durand and Fabio Vitali
Status of this Memo
This document is an Internet draft. Internet drafts are working
documents of the Internet Engineering Task Force (IETF), its areas and its
working groups. Note that other groups may also distribute working
information as Internet drafts.
Internet Drafts are draft documents valid for a maximum of six months
and can be updated, replaced or obsoleted by other documents at any
time. It is inappropriate to use Internet drafts as reference material
or to cite them as other than as "work in progress".
To learn the current status of any Internet draft please check the
"lid-abstracts.txt" listing contained in the Internet drafts shadow
directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
munnari.oz.au (Pacific Rim), ds.internic.net (US East coast) or
ftp.isi.edu (US West coast). Further information about the IETF can be
found at URL: http://www.ietf.org/
Distribution of this document is unlimited. Please send comments to
the WWW Distributed Authoring and Versioning mailing list,
<w3c-dist-auth@w3.org>, which may be joined by sending a message with
subject "subscribe" to <w3c-dist-auth-request@w3.org>. Discussions are
archived at URL:
http://www.w3.org/pub/WWW/Archives/Public/w3c-dist-auth/. The HTTP
working group at <http-wg@cuckoo.hpl.hp.com> also discusses the HTTP
protocol. Discussions of the HTTP working group are archived at URL:
http://www.ics.uci.edu/pub/ietf/http/. General discussions about HTTP
and the applications which use HTTP should take place on the
<www-talk@w3.org> mailing list.
Abstract
This document describes functional requirements for integrating
versioning into the WWW. Versioning is the fundamental basis of
document management systems, with far reaching effects on the
semantics of document identity and meaningful operations. This
document reflects the basic versioning needs for document management
and collaborative authoring, but does not define the
complete set of requirements for these domains where they extend
beyond the versioning of individual resources.
1. Introduction
This document discusses why versioning is
needed on the WWW, and the functional requirements for full version
support. It is divided into three main sections, in addition to this
introduction. In Section 2, we briefly describe the rationale for versioning
on the web, enumerating the goals of versioning on the WWW. All specific
requirements should support (and certainly should not hinder) the
realization of these goals. Section 3 describes global requirements for
protocol development. These are high-level requirements that should be
addressed in order to fulfil the rationale. These requirements are
separated, because their acceptance slready constrains the space within
which detaild functional requirements can be defined.
Finally, in Section 4, we list the specific low-level functional requirements
that satisfy the goals defined in section 2, while meeting the
requirements of section 3.
This work is based on David Fiander's suggestion to separate
versioning and configuration requirements, and we assume a two-layer
architecture for versioning on the web. The first layer, whose
requirements are defined in this document, addresses the problem of
handling multiple versions of single resources. The second layer
will address the thornier problems of configuration management for multiple
resources. Some, but by no means all, of the requirements for Configuration
Management are enumerated in the internet-draft "Requirements on HTTP for
Distributed Content Editing" [1].
2. Rationale
The problem of versioning, particularly in relation to hypertext
documents like those that make up the majority of web content, is a complex
one. Link integity, document set consistency, and editorial factors all
have an influence. The hypertext reserach community has already identified
many problems and possible solutions. For instance [3, 4] give good
overviews of the special problems involved.
Versioning in the context of the world-wide web offers a variety of
benefits:
- It provides infrastructure for efficient and controlled
management of large evolving web sites.
- Modern configuration management systems are built on some form of
repository that can track the revision history of individual resources, and
provide the higher-level tools to manage those saved versions. Basic
versioning capabilities are required to support such systems.
- It allows parallel development and update of single resources
- Since versioning systems register change by creating new objects, they
enable simultaneous write access by allowing the creation of variant
versions. Many also provide merge support to ease the reverse operation.
- It provides a framework for access control over resources.
- While specifics vary, most systems provide some method of
controlling or tracking access to enable collaborative resource
development.
- It allows browsing through past and alternative versions of a
resource
- Frequently the modification and authorship history of a resource is
critical information in itself.
- It provides stable names that can support externally stored
links for annotation and link-server support.
- Both annotation and link servers frequently need to store
stable references to portions of resources that are not under their
direct control. By providing stable states of resources, version
control systems allow not only stable pointers into those resources,
but also well-defined methods to determine the relationships of those
states of a resource.
- It allows explicit semantic representation of single
resources with multiple states
- A versioning system directly
represents the fact that a resource has an explicit history, and a
persistent identity across the various states it has had during the
course of that history.
3. Global requirements
This section covers the overarching constraints that must
inform and direct detailed requirements for versioning support. They
encompass compatibility across different implementations, as well as
compatibility with current practice. The following are general
requirements for WWW versioning:
- 1. Stableness of versions.
-
Most versioning systems are intended to enable an accurate record of the
history of evolution of a document. This accuracy is ensured by the fact
that a version eventually becomes "frozen" and immutable. Once a version
is frozen, further changes will create new versions rather than modifying
the original. In order for caching and persistent references to be
properly maintained, a client must be able to determine that a version has
been frozen. We require that unlocked resource versions be frozen. This
enables the common practice of keeping unfrozen "working versions". Any
successful attempt to retrieve a frozen version of a resource will always
retrieve exactly the same content, or return an error if that version (or
the resource itself) are no longer available. Since URLs may be
reassigned at a server's discretion this requirement applies only for that
period of time during which a URL identifies the same resource. HTTP 1.1's
Entity tags will need to integrated into the versioning strategy in order
for caching to work properly.
- 2. User Agent Interoperability.
-
All versioning clients should be able to work with any
versioning HTTP server. It is acceptable for some client/server
combinations to provide special features that are not universally
available, but the protocol should be sufficient that a basic level of
functionality will be universal. It should be possible for servers and
clients to negotiate the use of optional features.
- 3. Policy-free Versioning
-
Haake and Hicks [2] have identified the notion of versioning styles
(referred to here as versioning policies, to reflect the nature of
client/server interaction) as one way to think about the different policies
that versioning systems implement. Versioning policies include decisions
on the shape of version histories (linear or branched), the granularity of
change tracking, locking requirements made by a server, etc. The
protocol should not unnecessarily restrict version management policies to any
one paradigm. For instance, locking and version number assignment should
be inter-operable across servers and clients, even if there are some
differences in their preferred models.
- 4. Separation of resource retrieval and concurrency control
-
The protocol must separate the reservation and release of versioned
resources from their access methods. Provided that consistency constraints
are met before, during and after the modification of a versioned resource,
no single policy for accessing a resource should be enforced by the
protocol. For instance, a user may declare an intention to write before or
after retrieveing a resource via GET, may PUT a resource without releasing
the lock, and might even request a lock via HTTP, but then retrieve
the document using another communication channel such as FTP.
- 5. Data format compatibility.
-
The protocol should enable a versioning server to work with existing
resources and URLs. Special versioning information should not become a
mandatory part of document formats.
- 6. Legacy Client and Server Support.
-
Servers should make versioned resources accessible to non-versioning
clients in a format acceptable to them. Special version information that
would break existing clients, such as new mandatory headers,
cannot therefore be required for GET (and possibly also for PUT).
4. Functional requirements
The following functional requirements are intended to satisfy the
global requirements of Section 3 and enable the benefits listed in
Section 2. In the description of some of the requirements, the names
of proposed semantic operations are capitalized to distinguish them
from plain text. While HTTP methods are also capitalized, these
colliding conventions are not intended to suggest that any of the new
operations must be implemented by an HTTP method.
The protocol should provide:
- 1. Access to specific versions via a URL
-
For each version of a resource, on a server, there should be a URL to
refer to that version.
This is required for version-specific linking, and for non-versioning client
support.
- 2. A URL to denote a versioned resource itself, rather than specific
versions of it
-
This identifier is needed for queries about the versioning status of a
resource, that do not apply only to one version of that resource. This is
used to perform operations (such as adjusting attributes, changing locks,
or reassigning URLs) that affect all versions of a resource, rather than
any specific version.
- 3. Direct access to a server-defined "default", "current"
or "tip" version of a resource
-
This is one of the simplest ways to guarantee non-versioning client
compatibility. If no special version information is provided, the server
will provide a default. This does not rule out the possibility of a server
returning an error when no sensible default exists, but it does provide
a standard way to support non-versioning clients, and one of the most
common version access disciplines.
- 4. A way to access common related URLs from a versioned
URL, whether by server query, URL computation, or some other way:
- root version(s) of this document
- predecessor version(s) of this document
- successor version(s) of this document
- default version of this document
It must be possible in some way
for a versioning client to access related versions to a resource whose URL
it has. Possible methods of accessing such information, include, but are not
limited to: the server automatically adding header fields to a versioned
URL specifying the URL of the common related versions, the server providing
one or more query methods ("who is the previous version to this URL?"), or
a standardized way to compute related URLs when given a versioned URL. In
particular, access to the "default" version of a resource is an extremely
important operation, that a client should be able to perform at any time
that a versioned URL is seen.
- 5. A way to retrieve the complete version topology for this
resource
-
There should be a way to retrieve information about all versions of a
resource. The format for this information must be standardized so that the
basic information can be used by all clients. Other specialized formats
should be accomodated, for servers and clients that require information
that cannot be included in the standard topology.
- 6. Some way to determine that a URL points to a named version of a
resource
-
This might be implemented as part of the URL format, a server query or
additional headers.
- 7. Some way to determine a version identification and a resource
identification for a versioned resource, given its URL
-
This requirement describes the ability to take the URL of a version of a
resource and determine:
- a URL for the resource
- a version identifier for the resource.
Note that this kind of facility supports only some comparison operations: It
enables the determination that two version-containing URLs designate
versions of the same resource. However, given the phenomenon of URL
aliasing, it is insufficient to determine that they are not
versions of the same resource.
This is sort of a minimal "browsing
through time" requirement. This requirement allows a client to tell that a
versioned resource has been accessed and then to invoke special versioning
or configuration management operations on the resource. While client
performance will be best if this can be done via URL computation (i.e.
mangling) it could also be done by an extra query and round-trip to
the server.
- 8. A way to request exclusive access to a version of a resource
(LOCK)
-
Since not all systems implement lock-based access, there open questions as
to how this should be implemented. Client use of this method could be
optional, allowing some relatively strong guarantee on the meaning of
acquiring a lock. Alternatively, clients could be expected to take a lock,
but servers might implement different locking policies.
- 10. A way to release exclusive access to a resource (UNLOCK)
-
This is the inverse of LOCK.
- 11. A way for a client to declare an intention to modify a resource
(RESERVE)
-
This operation is required before any versioned update. Its effects may
vary depending on server policy, from locking a resource, to forking a new
variant, to a NOP on servers that do not track sessions or restrict
updates. If this operation returns a version number, the client is
required to make sure that it uses a copy of the data associated with that
version number of the resource for any update operations it carries out.
Servers that wish to enforce a mandatory GET operation before update, should
simply use a fresh version identifier on the return from this operation.
- 12. A way to declare the end of an intention to write a resource
(RELEASE)
-
This is the inverse of RESERVE. Typically, servers will commit
updates at this time, and return a final version identifier if possible
and if it was not already returned.
- 13. A way to submit a new version of a resource (PUT)
-
The server should be able to attach it to the correct part of the version
tree, based on the version number associated with the resource before its
modification.
- 14. A way for a client to request a version identifier for a checked
out version.
-
Such an identifier will not be used by any other client in the meantime.
The server may refuse the request.
- 15. A way for a client to propose a version identifier upon
submitting a version of a resource
-
The server may refuse to to use the client's suggested version identifier.
- 16. A way for a Client to supply metadata to be associated with a
version
-
The kinds of data supplied here might be simple textual comments or more
structured data. An ability to attach arbitrary fields and content is
probably required, but a standard set of attributes that would enable
inter-operation would be useful. For basic versioning we need only specify,
for example, that comments are attached as the message-body of the operation
that releases a write intention. The special formats for structured metadata
can then be handled by using content-type negotiation, and the
content-types defined as part of the Configuration Management layer.
- 17. A way for a server to provide a version identifier to be
used for a
resource in further operations
-
This general requirement notes that versioning clients are responsible
for providing the appropriate version identifier for a resource that is
being manipulated. In particular, if a resource is being modified, any
server provided version must be used when submitting an update. This
allows servers to track active sessions (however they may be implemented by
the server) by assigning version identifiers when documents are retrieved,
locked, or reserved.
- 18. A way to track resources that have been RESERVE'ed (Session
Tracking)
- This must be done by some kind of information transfer from the
server to the client (a "token"), when a resource is reserved. The client
must then send the appropriate token on other operations on that same
resource. This allows a matching of user's with their operations. of matching
information
Discussion of locking and reservation
This section discusses some possible implementation strategies that take
advantage of the structure of the requirements listed above
The requirements on RESERVE and PUT take care of some key global
requirements: version access is logically separated from concurrency control
(RESERVE/RELEASE) and updating. In terms of traditional CM systems, a
CHECKOUT is a RESERVE followed by a GET and a CHECKIN is a PUT followed by
an RELEASE. By separating concurrency control (locking and unlocking of
resources) from modification of resources, and from data transfer operations
we can achieve versioning policy independence.
This separation also allows us to meet requirement 18 (Session tracking),
using the negotiation of version identifiers top provide "tokens" for
outstanding operation. The version identifier of a new resource can now be
negotiated at several different points in time:
- When a resource is RESERVE'ed
- When a resource is LOCK'ed
- When a resource is PUT
- When a resource is RELEASE'ed
At each of these points in time both client and server can transmit a
version identifier in the request. Session tracking can be implemented by
using special version identifiers to track the outstanding sessions. The
requirement on clients is that if they are provided a version number by the
server in the reply to any of these operations, that same version must be
used in subsequent requests to take action on that resource, if they are
intended to be part of the same session.
While a client is always free to request a version identifier, a server
is never constrained to use a client's suggestion. This session tracking
mechanism does not require more overhead than is already needed for other
version-aware operations.
Security Considerations
These requirements do not directly address security issues. We do not
believe that versioning has any special security requirements other than
authentication and session-level security. We therefore expect that for
basic versioning, no special facilities will be required over those already
under development in the WWW. Full configuration management systems built
on the basic versioning support may require additional specialized security
methods built on top authenticated secure connections.
Acknowledgments
This document is a result of the vigorous and valuable discussion on the
Versioning on the Web
<www-vers-wg-request@ics.uci.edu>, and the Distributed Authoring
<w3c-dist-auth-request@w3.org&
gt;
mailing lists. All the the interactions on these lists have been helpful,
as have several conversations.
David Fiander's initial requirements got us started and clarified several
points. Jim Whitehead provided useful criticism, some new points, and
impetus to get this thing out the door. Yaron Golan and Christopher
Seiwald provided extensive commentary and discussion.
The following list include the above and others who have also helped
either with their postings, personal email or face-to-face discussions:
Dan Connolly, World Wide Web Consortium, connolly@w3.org
Ron Fein, Microsoft, ronfe@microsoft.com
David Fiander, Mortice Kern Systems, davidf@mks.com
Roy Fielding, U.C. Irvine, fielding@ics.uci.edu
Yaron Goland, Microsoft, yarong@microsoft.com
Dave Long, America Online, dave@sb.aol.com
Henrik Frystyk Nielsen, World Wide Web Consortium, frystyk@w3.org
Larry Masinter, Xerox PARC, masinter@parc.xerox.com
Murray Maloney, SoftQuad, murray@sq.com
Christopher Seiwald, Perforce Software, seiwald@perforce.com
Judith Slein, Xerox, slein@wrc.xeroc.com
References
[1] "VerSE: Towards Hypertext Versioning Styles", Anja Haake and David
Hicks, Proc. Hypertext'96, the Seventh ACM Conference on Hypertext,
1996, pages 224-234.
[2] "Requirements on HTTP for Distributed Content Editing"
,E. J. Whitehead, Work in Progress, IETF working draft
draft-whitehead-http-distreq-00.txt. September 1996
[3] "Structural and Congitive Problems in Providing Version Control for
Hypertext", Kaspar Østerbye, Proceedings of the ACM Conference on
Hypertext, Milano, Italy, 1992, pp 33-42.
[4] "Version Control in Hypermedia Databases" Technical report
TAMU-HRL-91-004, Hypertext Research Lab, Texas A&M University. 1991.
$Id: draft-durand-versreq-00.html,v 1.3 1996/11/06 15:51:55 David Exp $