Changes from last version

removed lock durations
added more references
changed "style-free versioning" to "policy independent versioning"
cleaned up prose.

To Do

Spell check.

HTTP Working Group                                 David G. Durand,
INTERNET-DRAFT                                     Boston University
<draft-durand-www-versreq-00.txt>                  November 6, 1996

Expires March, 1997

Functional Requirements and Framework for Versioning on the WWW

David G. Durand and Fabio Vitali

Status of this Memo

This document is an Internet draft. Internet drafts are working documents of the Internet Engineering Task Force (IETF), its areas and its working groups. Note that other groups may also distribute working information as Internet drafts.

Internet Drafts are draft documents valid for a maximum of six months and can be updated, replaced or obsoleted by other documents at any time. It is inappropriate to use Internet drafts as reference material or to cite them as other than as "work in progress".

To learn the current status of any Internet draft please check the "lid-abstracts.txt" listing contained in the Internet drafts shadow directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East coast) or ftp.isi.edu (US West coast). Further information about the IETF can be found at URL: http://www.ietf.org/

Distribution of this document is unlimited. Please send comments to the WWW Distributed Authoring and Versioning mailing list, <w3c-dist-auth@w3.org>, which may be joined by sending a message with subject "subscribe" to <w3c-dist-auth-request@w3.org>. Discussions are archived at URL: http://www.w3.org/pub/WWW/Archives/Public/w3c-dist-auth/. The HTTP working group at <http-wg@cuckoo.hpl.hp.com> also discusses the HTTP protocol. Discussions of the HTTP working group are archived at URL: http://www.ics.uci.edu/pub/ietf/http/. General discussions about HTTP and the applications which use HTTP should take place on the <www-talk@w3.org> mailing list.

Abstract

This document describes functional requirements for integrating versioning into the WWW. Versioning is the fundamental basis of document management systems, with far reaching effects on the semantics of document identity and meaningful operations. This document reflects the basic versioning needs for document management and collaborative authoring, but does not define the complete set of requirements for these domains where they extend beyond the versioning of individual resources.

1. Introduction

This document discusses why versioning is needed on the WWW, and the functional requirements for full version support. It is divided into three main sections, in addition to this introduction. In Section 2, we briefly describe the rationale for versioning on the web, enumerating the goals of versioning on the WWW. All specific requirements should support (and certainly should not hinder) the realization of these goals. Section 3 describes global requirements for protocol development. These are high-level requirements that should be addressed in order to fulfil the rationale. These requirements are separated, because their acceptance slready constrains the space within which detaild functional requirements can be defined. Finally, in Section 4, we list the specific low-level functional requirements that satisfy the goals defined in section 2, while meeting the requirements of section 3.

This work is based on David Fiander's suggestion to separate versioning and configuration requirements, and we assume a two-layer architecture for versioning on the web. The first layer, whose requirements are defined in this document, addresses the problem of handling multiple versions of single resources. The second layer will address the thornier problems of configuration management for multiple resources. Some, but by no means all, of the requirements for Configuration Management are enumerated in the internet-draft "Requirements on HTTP for Distributed Content Editing" [1].

2. Rationale

The problem of versioning, particularly in relation to hypertext documents like those that make up the majority of web content, is a complex one. Link integity, document set consistency, and editorial factors all have an influence. The hypertext reserach community has already identified many problems and possible solutions. For instance [3, 4] give good overviews of the special problems involved.

Versioning in the context of the world-wide web offers a variety of benefits:

It provides infrastructure for efficient and controlled management of large evolving web sites.: Modern configuration management systems are built on some form of repository that can track the revision history of individual resources, and provide the higher-level tools to manage those saved versions. Basic versioning capabilities are required to support such systems.
It allows parallel development and update of single resources: Since versioning systems register change by creating new objects, they enable simultaneous write access by allowing the creation of variant versions. Many also provide merge support to ease the reverse operation.
It provides a framework for access control over resources.: While specifics vary, most systems provide some method of controlling or tracking access to enable collaborative resource development.
It allows browsing through past and alternative versions of a resource: Frequently the modification and authorship history of a resource is critical information in itself.
It provides stable names that can support externally stored links for annotation and link-server support.: Both annotation and link servers frequently need to store stable references to portions of resources that are not under their direct control. By providing stable states of resources, version control systems allow not only stable pointers into those resources, but also well-defined methods to determine the relationships of those states of a resource.
It allows explicit semantic representation of single resources with multiple states: A versioning system directly represents the fact that a resource has an explicit history, and a persistent identity across the various states it has had during the course of that history.

3. Global requirements

This section covers the overarching constraints that must inform and direct detailed requirements for versioning support. They encompass compatibility across different implementations, as well as compatibility with current practice. The following are general requirements for WWW versioning:

1. Stableness of versions.: Most versioning systems are intended to enable an accurate record of the history of evolution of a document. This accuracy is ensured by the fact that a version eventually becomes "frozen" and immutable. Once a version is frozen, further changes will create new versions rather than modifying the original. In order for caching and persistent references to be properly maintained, a client must be able to determine that a version has been frozen. We require that unlocked resource versions be frozen. This enables the common practice of keeping unfrozen "working versions". Any successful attempt to retrieve a frozen version of a resource will always retrieve exactly the same content, or return an error if that version (or the resource itself) are no longer available. Since URLs may be reassigned at a server's discretion this requirement applies only for that period of time during which a URL identifies the same resource. HTTP 1.1's Entity tags will need to integrated into the versioning strategy in order for caching to work properly.
2. User Agent Interoperability.: All versioning clients should be able to work with any versioning HTTP server. It is acceptable for some client/server combinations to provide special features that are not universally available, but the protocol should be sufficient that a basic level of functionality will be universal. It should be possible for servers and clients to negotiate the use of optional features.
3. Policy-free Versioning: Haake and Hicks [2] have identified the notion of versioning styles (referred to here as versioning policies, to reflect the nature of client/server interaction) as one way to think about the different policies that versioning systems implement. Versioning policies include decisions on the shape of version histories (linear or branched), the granularity of change tracking, locking requirements made by a server, etc. The protocol should not unnecessarily restrict version management policies to any one paradigm. For instance, locking and version number assignment should be inter-operable across servers and clients, even if there are some differences in their preferred models.
4. Separation of resource retrieval and concurrency control: The protocol must separate the reservation and release of versioned resources from their access methods. Provided that consistency constraints are met before, during and after the modification of a versioned resource, no single policy for accessing a resource should be enforced by the protocol. For instance, a user may declare an intention to write before or after retrieveing a resource via GET, may PUT a resource without releasing the lock, and might even request a lock via HTTP, but then retrieve the document using another communication channel such as FTP.
5. Data format compatibility.: The protocol should enable a versioning server to work with existing resources and URLs. Special versioning information should not become a mandatory part of document formats.
6. Legacy Client and Server Support.: Servers should make versioned resources accessible to non-versioning clients in a format acceptable to them. Special version information that would break existing clients, such as new mandatory headers, cannot therefore be required for GET (and possibly also for PUT).

4. Functional requirements

The following functional requirements are intended to satisfy the global requirements of Section 3 and enable the benefits listed in Section 2. In the description of some of the requirements, the names of proposed semantic operations are capitalized to distinguish them from plain text. While HTTP methods are also capitalized, these colliding conventions are not intended to suggest that any of the new operations must be implemented by an HTTP method.

The protocol should provide:

1. Access to specific versions via a URL

For each version of a resource, on a server, there should be a URL to refer to that version. This is required for version-specific linking, and for non-versioning client support.

2. A URL to denote a versioned resource itself, rather than specific versions of it

This identifier is needed for queries about the versioning status of a resource, that do not apply only to one version of that resource. This is used to perform operations (such as adjusting attributes, changing locks, or reassigning URLs) that affect all versions of a resource, rather than any specific version.

3. Direct access to a server-defined "default", "current" or "tip" version of a resource

This is one of the simplest ways to guarantee non-versioning client compatibility. If no special version information is provided, the server will provide a default. This does not rule out the possibility of a server returning an error when no sensible default exists, but it does provide a standard way to support non-versioning clients, and one of the most common version access disciplines.

4. A way to access common related URLs from a versioned URL, whether by server query, URL computation, or some other way: root version(s) of this document predecessor version(s) of this document successor version(s) of this document default version of this document It must be possible in some way for a versioning client to access related versions to a resource whose URL it has. Possible methods of accessing such information, include, but are not limited to: the server automatically adding header fields to a versioned URL specifying the URL of the common related versions, the server providing one or more query methods ("who is the previous version to this URL?"), or a standardized way to compute related URLs when given a versioned URL. In particular, access to the "default" version of a resource is an extremely important operation, that a client should be able to perform at any time that a versioned URL is seen.

5. A way to retrieve the complete version topology for this resource

There should be a way to retrieve information about all versions of a resource. The format for this information must be standardized so that the basic information can be used by all clients. Other specialized formats should be accomodated, for servers and clients that require information that cannot be included in the standard topology.

6. Some way to determine that a URL points to a named version of a resource

This might be implemented as part of the URL format, a server query or additional headers.

7. Some way to determine a version identification and a resource identification for a versioned resource, given its URL

This requirement describes the ability to take the URL of a version of a resource and determine:

a URL for the resource
a version identifier for the resource.

Note that this kind of facility supports only some comparison operations: It enables the determination that two version-containing URLs designate versions of the same resource. However, given the phenomenon of URL aliasing, it is insufficient to determine that they are not versions of the same resource.

This is sort of a minimal "browsing through time" requirement. This requirement allows a client to tell that a versioned resource has been accessed and then to invoke special versioning or configuration management operations on the resource. While client performance will be best if this can be done via URL computation (i.e. mangling) it could also be done by an extra query and round-trip to the server.

8. A way to request exclusive access to a version of a resource (LOCK)

Since not all systems implement lock-based access, there open questions as to how this should be implemented. Client use of this method could be optional, allowing some relatively strong guarantee on the meaning of acquiring a lock. Alternatively, clients could be expected to take a lock, but servers might implement different locking policies.

10. A way to release exclusive access to a resource (UNLOCK)

This is the inverse of LOCK.

11. A way for a client to declare an intention to modify a resource (RESERVE)

This operation is required before any versioned update. Its effects may vary depending on server policy, from locking a resource, to forking a new variant, to a NOP on servers that do not track sessions or restrict updates. If this operation returns a version number, the client is required to make sure that it uses a copy of the data associated with that version number of the resource for any update operations it carries out. Servers that wish to enforce a mandatory GET operation before update, should simply use a fresh version identifier on the return from this operation.

12. A way to declare the end of an intention to write a resource (RELEASE)

This is the inverse of RESERVE. Typically, servers will commit updates at this time, and return a final version identifier if possible and if it was not already returned.

13. A way to submit a new version of a resource (PUT)

The server should be able to attach it to the correct part of the version tree, based on the version number associated with the resource before its modification.

14. A way for a client to request a version identifier for a checked out version.

Such an identifier will not be used by any other client in the meantime. The server may refuse the request.

15. A way for a client to propose a version identifier upon submitting a version of a resource

The server may refuse to to use the client's suggested version identifier.

16. A way for a Client to supply metadata to be associated with a version

The kinds of data supplied here might be simple textual comments or more structured data. An ability to attach arbitrary fields and content is probably required, but a standard set of attributes that would enable inter-operation would be useful. For basic versioning we need only specify, for example, that comments are attached as the message-body of the operation that releases a write intention. The special formats for structured metadata can then be handled by using content-type negotiation, and the content-types defined as part of the Configuration Management layer.

17. A way for a server to provide a version identifier to be used for a resource in further operations

This general requirement notes that versioning clients are responsible for providing the appropriate version identifier for a resource that is being manipulated. In particular, if a resource is being modified, any server provided version must be used when submitting an update. This allows servers to track active sessions (however they may be implemented by the server) by assigning version identifiers when documents are retrieved, locked, or reserved.

18. A way to track resources that have been RESERVE'ed (Session Tracking)

This must be done by some kind of information transfer from the server to the client (a "token"), when a resource is reserved. The client must then send the appropriate token on other operations on that same resource. This allows a matching of user's with their operations. of matching information

Discussion of locking and reservation

This section discusses some possible implementation strategies that take advantage of the structure of the requirements listed above

The requirements on RESERVE and PUT take care of some key global requirements: version access is logically separated from concurrency control (RESERVE/RELEASE) and updating. In terms of traditional CM systems, a CHECKOUT is a RESERVE followed by a GET and a CHECKIN is a PUT followed by an RELEASE. By separating concurrency control (locking and unlocking of resources) from modification of resources, and from data transfer operations we can achieve versioning policy independence.

This separation also allows us to meet requirement 18 (Session tracking), using the negotiation of version identifiers top provide "tokens" for outstanding operation. The version identifier of a new resource can now be negotiated at several different points in time:

When a resource is RESERVE'ed
When a resource is LOCK'ed
When a resource is PUT
When a resource is RELEASE'ed

At each of these points in time both client and server can transmit a version identifier in the request. Session tracking can be implemented by using special version identifiers to track the outstanding sessions. The requirement on clients is that if they are provided a version number by the server in the reply to any of these operations, that same version must be used in subsequent requests to take action on that resource, if they are intended to be part of the same session.

While a client is always free to request a version identifier, a server is never constrained to use a client's suggestion. This session tracking mechanism does not require more overhead than is already needed for other version-aware operations.

Security Considerations

These requirements do not directly address security issues. We do not believe that versioning has any special security requirements other than authentication and session-level security. We therefore expect that for basic versioning, no special facilities will be required over those already under development in the WWW. Full configuration management systems built on the basic versioning support may require additional specialized security methods built on top authenticated secure connections.

Acknowledgments

This document is a result of the vigorous and valuable discussion on the Versioning on the Web <www-vers-wg-request@ics.uci.edu>, and the Distributed Authoring <w3c-dist-auth-request@w3.org& gt; mailing lists. All the the interactions on these lists have been helpful, as have several conversations. David Fiander's initial requirements got us started and clarified several points. Jim Whitehead provided useful criticism, some new points, and impetus to get this thing out the door. Yaron Golan and Christopher Seiwald provided extensive commentary and discussion.

The following list include the above and others who have also helped either with their postings, personal email or face-to-face discussions:

Dan Connolly, World Wide Web Consortium, connolly@w3.org
Ron Fein, Microsoft, ronfe@microsoft.com
David Fiander, Mortice Kern Systems, davidf@mks.com
Roy Fielding, U.C. Irvine, fielding@ics.uci.edu
Yaron Goland, Microsoft, yarong@microsoft.com
Dave Long, America Online, dave@sb.aol.com
Henrik Frystyk Nielsen, World Wide Web Consortium, frystyk@w3.org
Larry Masinter, Xerox PARC, masinter@parc.xerox.com
Murray Maloney, SoftQuad, murray@sq.com
Christopher Seiwald, Perforce Software, seiwald@perforce.com
Judith Slein, Xerox, slein@wrc.xeroc.com

References

[1] "VerSE: Towards Hypertext Versioning Styles", Anja Haake and David Hicks, Proc. Hypertext'96, the Seventh ACM Conference on Hypertext, 1996, pages 224-234.

[2] "Requirements on HTTP for Distributed Content Editing" ,E. J. Whitehead, Work in Progress, IETF working draft draft-whitehead-http-distreq-00.txt. September 1996

[3] "Structural and Congitive Problems in Providing Version Control for Hypertext", Kaspar Østerbye, Proceedings of the ACM Conference on Hypertext, Milano, Italy, 1992, pp 33-42.

[4] "Version Control in Hypermedia Databases" Technical report TAMU-HRL-91-004, Hypertext Research Lab, Texas A&M University. 1991.

$Id: draft-durand-versreq-00.html,v 1.3 1996/11/06 15:51:55 David Exp $