[Prev][Next][Index][Thread]

Text-only versioning requirements.






Changes from last version

   * removed lock durations
   * added more references
   * changed "style-free versioning" to "policy independent versioning"
   * cleaned up prose.

To Do

Spell check.

HTTP Working Group                                 David G. Durand,
INTERNET-DRAFT                                     Boston University
<draft-durand-versreq-00.txt>                      November 6, 1996

Expires March, 1997

Functional Requirements and Framework for Versioning on the WWW

David G. Durand and Fabio Vitali

Status of this Memo

This document is an Internet draft. Internet drafts are working documents of
the Internet Engineering Task Force (IETF), its areas and its working
groups. Note that other groups may also distribute working information as
Internet drafts.

Internet Drafts are draft documents valid for a maximum of six months and
can be updated, replaced or obsoleted by other documents at any time. It is
inappropriate to use Internet drafts as reference material or to cite them
as other than as "work in progress".

To learn the current status of any Internet draft please check the
"lid-abstracts.txt" listing contained in the Internet drafts shadow
directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au
(Pacific Rim), ds.internic.net (US East coast) or ftp.isi.edu (US West
coast). Further information about the IETF can be found at URL:
http://www.ietf.org/

Distribution of this document is unlimited. Please send comments to the WWW
Distributed Authoring and Versioning mailing list, <w3c-dist-auth@w3.org>,
which may be joined by sending a message with subject "subscribe" to
<w3c-dist-auth-request@w3.org>. Discussions are archived at URL:
http://www.w3.org/pub/WWW/Archives/Public/w3c-dist-auth/. The HTTP working
group at <http-wg@cuckoo.hpl.hp.com> also discusses the HTTP protocol.
Discussions of the HTTP working group are archived at URL:
http://www.ics.uci.edu/pub/ietf/http/. General discussions about HTTP and
the applications which use HTTP should take place on the <www-talk@w3.org>
mailing list.

Abstract

This document describes functional requirements for integrating versioning
into the WWW. Versioning is the fundamental basis of document management
systems, with far reaching effects on the semantics of document identity and
meaningful operations. This document reflects the basic versioning needs for
document management and collaborative authoring, but does not define the
complete set of requirements for these domains where they extend beyond the
versioning of individual resources.

1. Introduction

This document discusses why versioning is needed on the WWW, and the
functional requirements for full version support. It is divided into three
main sections, in addition to this introduction. In Section 2, we briefly
describe the rationale for versioning on the web, enumerating the goals of
versioning on the WWW. All specific requirements should support (and
certainly should not hinder) the realization of these goals. Section 3
describes global requirements for protocol development. These are high-level
requirements that should be addressed in order to fulfil the rationale.
These requirements are separated, because their acceptance slready
constrains the space within which detaild functional requirements can be
defined. Finally, in Section 4, we list the specific low-level functional
requirements that satisfy the goals defined in section 2, while meeting the
requirements of section 3.

This work is based on David Fiander's suggestion to separate versioning and
configuration requirements, and we assume a two-layer architecture for
versioning on the web. The first layer, whose requirements are defined in
this document, addresses the problem of handling multiple versions of single
resources. The second layer will address the thornier problems of
configuration management for multiple resources. Some, but by no means all,
of the requirements for Configuration Management are enumerated in the
internet-draft "Requirements on HTTP for Distributed Content Editing" [1].

2. Rationale

The problem of versioning, particularly in relation to hypertext documents
like those that make up the majority of web content, is a complex one. Link
integity, document set consistency, and editorial factors all have an
influence. The hypertext reserach community has already identified many
problems and possible solutions. For instance [3, 4] give good overviews of
the special problems involved.

Versioning in the context of the world-wide web offers a variety of
benefits:

It provides infrastructure for efficient and controlled management of large
evolving web sites.
     Modern configuration management systems are built on some form of
     repository that can track the revision history of individual resources,
     and provide the higher-level tools to manage those saved versions.
     Basic versioning capabilities are required to support such systems.
It allows parallel development and update of single resources
     Since versioning systems register change by creating new objects, they
     enable simultaneous write access by allowing the creation of variant
     versions. Many also provide merge support to ease the reverse
     operation.
It provides a framework for access control over resources.
     While specifics vary, most systems provide some method of controlling
     or tracking access to enable collaborative resource development.
It allows browsing through past and alternative versions of a resource
     Frequently the modification and authorship history of a resource is
     critical information in itself.
It provides stable names that can support externally stored links for
annotation and link-server support.
     Both annotation and link servers frequently need to store stable
     references to portions of resources that are not under their direct
     control. By providing stable states of resources, version control
     systems allow not only stable pointers into those resources, but also
     well-defined methods to determine the relationships of those states of
     a resource.
It allows explicit semantic representation of single resources with multiple
states
     A versioning system directly represents the fact that a resource has an
     explicit history, and a persistent identity across the various states
     it has had during the course of that history.

3. Global requirements

This section covers the overarching constraints that must inform and direct
detailed requirements for versioning support. They encompass compatibility
across different implementations, as well as compatibility with current
practice. The following are general requirements for WWW versioning:

1. Stableness of versions.
     Most versioning systems are intended to enable an accurate record of
     the history of evolution of a document. This accuracy is ensured by the
     fact that a version eventually becomes "frozen" and immutable. Once a
     version is frozen, further changes will create new versions rather than
     modifying the original. In order for caching and persistent references
     to be properly maintained, a client must be able to determine that a
     version has been frozen. We require that unlocked resource versions be
     frozen. This enables the common practice of keeping unfrozen "working
     versions". Any successful attempt to retrieve a frozen version of a
     resource will always retrieve exactly the same content, or return an
     error if that version (or the resource itself) are no longer available.
     Since URLs may be reassigned at a server's discretion this requirement
     applies only for that period of time during which a URL identifies the
     same resource. HTTP 1.1's Entity tags will need to integrated into the
     versioning strategy in order for caching to work properly.
2. User Agent Interoperability.
     All versioning clients should be able to work with any versioning HTTP
     server. It is acceptable for some client/server combinations to provide
     special features that are not universally available, but the protocol
     should be sufficient that a basic level of functionality will be
     universal. It should be possible for servers and clients to negotiate
     the use of optional features.
3. Policy-free Versioning
     Haake and Hicks [2] have identified the notion of versioning styles
     (referred to here as versioning policies, to reflect the nature of
     client/server interaction) as one way to think about the different
     policies that versioning systems implement. Versioning policies include
     decisions on the shape of version histories (linear or branched), the
     granularity of change tracking, locking requirements made by a server,
     etc. The protocol should not unnecessarily restrict version management
     policies to any one paradigm. For instance, locking and version number
     assignment should be inter-operable across servers and clients, even if
     there are some differences in their preferred models.
4. Separation of resource retrieval and concurrency control
     The protocol must separate the reservation and release of versioned
     resources from their access methods. Provided that consistency
     constraints are met before, during and after the modification of a
     versioned resource, no single policy for accessing a resource should be
     enforced by the protocol. For instance, a user may declare an intention
     to write before or after retrieveing a resource via GET, may PUT a
     resource without releasing the lock, and might even request a lock via
     HTTP, but then retrieve the document using another communication
     channel such as FTP.
5. Data format compatibility.
     The protocol should enable a versioning server to work with existing
     resources and URLs. Special versioning information should not become a
     mandatory part of document formats.
6. Legacy Client and Server Support.
     Servers should make versioned resources accessible to non-versioning
     clients in a format acceptable to them. Special version information
     that would break existing clients, such as new mandatory headers,
     cannot therefore be required for GET (and possibly also for PUT).

4. Functional requirements

The following functional requirements are intended to satisfy the global
requirements of Section 3 and enable the benefits listed in Section 2. In
the description of some of the requirements, the names of proposed semantic
operations are capitalized to distinguish them from plain text. While HTTP
methods are also capitalized, these colliding conventions are not intended
to suggest that any of the new operations must be implemented by an HTTP
method.

The protocol should provide:

1. Access to specific versions via a URL
     For each version of a resource, on a server, there should be a URL to
     refer to that version. This is required for version-specific linking,
     and for non-versioning client support.
2. A URL to denote a versioned resource itself, rather than specific
versions of it
     This identifier is needed for queries about the versioning status of a
     resource, that do not apply only to one version of that resource. This
     is used to perform operations (such as adjusting attributes, changing
     locks, or reassigning URLs) that affect all versions of a resource,
     rather than any specific version.
3. Direct access to a server-defined "default", "current" or "tip" version
of a resource
     This is one of the simplest ways to guarantee non-versioning client
     compatibility. If no special version information is provided, the
     server will provide a default. This does not rule out the possibility
     of a server returning an error when no sensible default exists, but it
     does provide a standard way to support non-versioning clients, and one
     of the most common version access disciplines.
4. A way to access common related URLs from a versioned URL, whether by
server query, URL computation, or some other way:
   o root version(s) of this document
   o predecessor version(s) of this document
   o successor version(s) of this document
   o default version of this document
It must be possible in some way for a versioning client to access related
versions to a resource whose URL it has. Possible methods of accessing such
information, include, but are not limited to: the server automatically
adding header fields to a versioned URL specifying the URL of the common
related versions, the server providing one or more query methods ("who is
the previous version to this URL?"), or a standardized way to compute
related URLs when given a versioned URL. In particular, access to the
"default" version of a resource is an extremely important operation, that a
client should be able to perform at any time that a versioned URL is seen.
5. A way to retrieve the complete version topology for this resource
     There should be a way to retrieve information about all versions of a
     resource. The format for this information must be standardized so that
     the basic information can be used by all clients. Other specialized
     formats should be accomodated, for servers and clients that require
     information that cannot be included in the standard topology.
6. Some way to determine that a URL points to a named version of a resource
     This might be implemented as part of the URL format, a server query or
     additional headers.
7. Some way to determine a version identification and a resource
identification for a versioned resource, given its URL
     This requirement describes the ability to take the URL of a version of
     a resource and determine:
        o a URL for the resource
        o a version identifier for the resource.
     Note that this kind of facility supports only some comparison
     operations: It enables the determination that two version-containing
     URLs designate versions of the same resource. However, given the
     phenomenon of URL aliasing, it is insufficient to determine that they
     are not versions of the same resource.

     This is sort of a minimal "browsing through time" requirement. This
     requirement allows a client to tell that a versioned resource has been
     accessed and then to invoke special versioning or configuration
     management operations on the resource. While client performance will be
     best if this can be done via URL computation (i.e. mangling) it could
     also be done by an extra query and round-trip to the server.
8. A way to request exclusive access to a version of a resource (LOCK)
     Since not all systems implement lock-based access, there open questions
     as to how this should be implemented. Client use of this method could
     be optional, allowing some relatively strong guarantee on the meaning
     of acquiring a lock. Alternatively, clients could be expected to take a
     lock, but servers might implement different locking policies.
10. A way to release exclusive access to a resource (UNLOCK)
     This is the inverse of LOCK.
11. A way for a client to declare an intention to modify a resource
(RESERVE)
     This operation is required before any versioned update. Its effects may
     vary depending on server policy, from locking a resource, to forking a
     new variant, to a NOP on servers that do not track sessions or restrict
     updates. If this operation returns a version number, the client is
     required to make sure that it uses a copy of the data associated with
     that version number of the resource for any update operations it
     carries out. Servers that wish to enforce a mandatory GET operation
     before update, should simply use a fresh version identifier on the
     return from this operation.
12. A way to declare the end of an intention to write a resource (RELEASE)
     This is the inverse of RESERVE. Typically, servers will commit updates
     at this time, and return a final version identifier if possible and if
     it was not already returned.
13. A way to submit a new version of a resource (PUT)
     The server should be able to attach it to the correct part of the
     version tree, based on the version number associated with the resource
     before its modification.
14. A way for a client to request a version identifier for a checked out
version.
     Such an identifier will not be used by any other client in the
     meantime. The server may refuse the request.
15. A way for a client to propose a version identifier upon submitting a
version of a resource
     The server may refuse to to use the client's suggested version
     identifier.
16. A way for a Client to supply metadata to be associated with a version
     The kinds of data supplied here might be simple textual comments or
     more structured data. An ability to attach arbitrary fields and content
     is probably required, but a standard set of attributes that would
     enable inter-operation would be useful. For basic versioning we need
     only specify, for example, that comments are attached as the
     message-body of the operation that releases a write intention. The
     special formats for structured metadata can then be handled by using
     content-type negotiation, and the content-types defined as part of the
     Configuration Management layer.
17. A way for a server to provide a version identifier to be used for a
resource in further operations
     This general requirement notes that versioning clients are responsible
     for providing the appropriate version identifier for a resource that is
     being manipulated. In particular, if a resource is being modified, any
     server provided version must be used when submitting an update. This
     allows servers to track active sessions (however they may be
     implemented by the server) by assigning version identifiers when
     documents are retrieved, locked, or reserved.
18. A way to track resources that have been RESERVE'ed (Session Tracking)
     This must be done by some kind of information transfer from the server
     to the client (a "token"), when a resource is reserved. The client must
     then send the appropriate token on other operations on that same
     resource. This allows a matching of user's with their operations. of
     matching information

Discussion of locking and reservation

This section discusses some possible implementation strategies that take
advantage of the structure of the requirements listed above

The requirements on RESERVE and PUT take care of some key global
requirements: version access is logically separated from concurrency control
(RESERVE/RELEASE) and updating. In terms of traditional CM systems, a
CHECKOUT is a RESERVE followed by a GET and a CHECKIN is a PUT followed by
an RELEASE. By separating concurrency control (locking and unlocking of
resources) from modification of resources, and from data transfer operations
we can achieve versioning policy independence.

This separation also allows us to meet requirement 18 (Session tracking),
using the negotiation of version identifiers top provide "tokens" for
outstanding operation. The version identifier of a new resource can now be
negotiated at several different points in time:

   * When a resource is RESERVE'ed
   * When a resource is LOCK'ed
   * When a resource is PUT
   * When a resource is RELEASE'ed

At each of these points in time both client and server can transmit a
version identifier in the request. Session tracking can be implemented by
using special version identifiers to track the outstanding sessions. The
requirement on clients is that if they are provided a version number by the
server in the reply to any of these operations, that same version must be
used in subsequent requests to take action on that resource, if they are
intended to be part of the same session.

While a client is always free to request a version identifier, a server is
never constrained to use a client's suggestion. This session tracking
mechanism does not require more overhead than is already needed for other
version-aware operations.

Security Considerations

These requirements do not directly address security issues. We do not
believe that versioning has any special security requirements other than
authentication and session-level security. We therefore expect that for
basic versioning, no special facilities will be required over those already
under development in the WWW. Full configuration management systems built on
the basic versioning support may require additional specialized security
methods built on top authenticated secure connections.

Acknowledgments

This document is a result of the vigorous and valuable discussion on the
Versioning on the Web <www-vers-wg-request@ics.uci.edu>, and the Distributed
Authoring <w3c-dist-auth-request@w3.org> mailing lists. All the the
interactions on these lists have been helpful, as have several
conversations. David Fiander's initial requirements got us started and
clarified several points. Jim Whitehead provided useful criticism, some new
points, and impetus to get this thing out the door. Yaron Golan and
Christopher Seiwald provided extensive commentary and discussion.

The following list include the above and others who have also helped either
with their postings, personal email or face-to-face discussions:

Dan Connolly, World Wide Web Consortium, connolly@w3.org
Ron Fein, Microsoft, ronfe@microsoft.com
David Fiander, Mortice Kern Systems, davidf@mks.com
Roy Fielding, U.C. Irvine, fielding@ics.uci.edu
Yaron Goland, Microsoft, yarong@microsoft.com
Dave Long, America Online, dave@sb.aol.com
Henrik Frystyk Nielsen, World Wide Web Consortium, frystyk@w3.org
Larry Masinter, Xerox PARC, masinter@parc.xerox.com
Murray Maloney, SoftQuad, murray@sq.com
Christopher Seiwald, Perforce Software, seiwald@perforce.com
Judith Slein, Xerox, slein@wrc.xeroc.com

References

[1] "VerSE: Towards Hypertext Versioning Styles", Anja Haake and David
Hicks, Proc. Hypertext'96, the Seventh ACM Conference on Hypertext, 1996,
pages 224-234.

[2] "Requirements on HTTP for Distributed Content Editing" ,E. J. Whitehead,
Work in Progress, IETF working draft draft-whitehead-http-distreq-00.txt.
September 1996

[3] "Structural and Congitive Problems in Providing Version Control for
Hypertext", Kaspar Østerbye, Proceedings of the ACM Conference on Hypertext,
Milano, Italy, 1992, pp 33-42.

[4] "Version Control in Hypermedia Databases" Technical report
TAMU-HRL-91-004, Hypertext Research Lab, Texas A&M University. 1991.

$Id: draft-durand-versreq-00.html,v 1.3 1996/11/06 15:51:55 David Exp $


_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________