v0.4 of Dist. Auth. Requirements

Based on feedback I received at the Cambridge meeting, here is an updated
version (v0.4) of the distributed authoring requirements document.
A summary of changes is:

  - Throughout, I changed "entity" to "resource" where appropriate.
    I had been using "resource" to mean "an object and all its versions,"
    and "entity" for just a particular version of an object.  It turns
    out that, according to the definition in the HTTP/1.1 spec., a
    resource maps to what I had been referring to as an entity,
    and an entity refers to the bytes being sent out over the wire,
    which I had been referring to as a transmission entity.

  - I added the ability to "list" relationships, which compliments the
    ability to create, delete, and query relationships.

  - I changed the name of read locks to be "no-modify" locks.  This is a
    cosmetic change -- the meaning of a "no-modify" lock is identical to
    the old read lock.

  - I added a new read lock, which gives only the owner of the lock
    permission to read the locked resource.

  - Extra text was added to the lock description which discusses the
    read lock, and also the interactions between locks.

  - The notification of intent to edit functionality has been removed,
    since it is being move the versioning requirements document.  This
    does not indicate this functionality is being dropped -- this is
    a move, not an absolute delete.

  - Discussion was added to the partial write section describing the
    possible scope of changes.

  - Text was added to the Copy discussion mentioning that it is the
    responsibility of the user to ensure that scripts work properly in
    their new location.

Missing from this draft is any discussion of authentication,
authorization, and privacy.  I think this draft needs to at least
reference requirements in other drafts, even if it does not
redefine them here.  I welcome your suggestions as to which
documents and sections are directly relevant to distributed authoring.

Many apologies for submitting this draft a week after it was promised.

I will be offline until Tuesday afternoon, and will be unable to
respond to comments until next Wednesday. This draft can be found in
HTML format at URL:

http://ww.ics.uci.edu/~ejw/authoring/draft-whitehead-http-distreq-00.html

- Jim

-------

HTTP Working Group                                 E. J. Whitehead, Jr.
INTERNET-DRAFT                                     U.C. Irvine
<draft-whitehead-http-distreq-00.txt>              September 1996

Expires March, 1997

Rev. 0.4, September 27, 1996

            Requirements on HTTP for Distributed Content Editing

Status of this Memo

This document is an Internet draft. Internet drafts are working documents of
the Internet Engineering Task Force (IETF), its areas and its working
groups. Note that other groups may also distribute working information as
Internet drafts.

Internet Drafts are draft documents valid for a maximum of six months and
can be updated, replaced or obsoleted by other documents at any time. It is
inappropriate to use Internet drafts as reference material or to cite them
as other than as "work in progress".

To learn the current status of any Internet draft please check the
"lid-abstracts.txt" listing contained in the Internet drafts shadow
directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au
(Pacific Rim), ds.internic.net (US East coast) or ftp.isi.edu (US West
coast). Further information about the IETF can be found at URL:
http://www.ietf.org/

Distribution of this document is unlimited. Please send comments to the WWW
Distributed Authoring and Versioning mailing list, <w3c-dist-auth@w3.org>,
which may be joined by sending a message with subject "subscribe" to
<w3c-dist-auth-request@w3.org>. Discussions are archived at URL:
http://www.w3.org/pub/WWW/Archives/Public/w3c-dist-auth/. The HTTP working
group at <http-wg@cuckoo.hpl.hp.com> also discusses the HTTP protocol.
Discussions of the HTTP working group are archived at URL:
http://www.ics.uci.edu/pub/ietf/http/. General discussions about HTTP and
the applications which use HTTP should take place on the <www-talk@w3.org>
mailing list.

Abstract

The HyperText Transfer Protocol, version 1.1 (HTTP/1.1), provides simple
support for applications which allow remote editing of typed data. In
practice, the existing capabilities of HTTP/1.1 have proven inadequate to
support efficient, scalable remote editing free of overwriting conflicts.
This document presents a list of features in the form of requirements which,
if implemented, would improve the efficiency of common remote editing
operations, provide a locking mechanism to prevent overwrite conflicts,
improve relationship management support between non-HTML data types, provide
a simple attribute-value metadata facility, and provide for the creation and
reading of container data types. These requirements are also supportive of
versioning capability.

1. Introduction

This document describes functionality which, if provided in the HyperText
Transfer Protocol (HTTP) [4], would support the interoperability of tools
which allow remote loading, editing and saving (publishing) of various media
types using HTTP. As much as possible, this functionality is described
without suggesting a proposed implementation, since there are many ways to
perform the functionality within the HTTP framework. It is also possible
that a single mechanism within HTTP could simultaneously satisfy several
requirements.

Much of the functionality described in this document stems from the
assumption that people performing distributed authoring only have access to
the objects they are editing via the HTTP protocol. This is in contrast to
the majority of current authoring practice, where there is access to the
underlying storage media, often with a shell or graphical user interface
mediating access to a filesystem. Authors need more than just remote control
over their individual documents: they need remote control over the namespace
in which those documents reside. Currently, authors control their namespace
by interacting directly with the underlying storage system, but when
performing distributed authoring this access is not available.

2. Requirements

In the requirement descriptions below, the requirement will be stated,
followed by its rationale. If any current distributed authoring tools
currently implement the requirement, this is also mentioned. It is assumed
that "server" means "a program which receives and responds to HTTP
requests," and that "distributed authoring tool" or "intranet enabled tool"
means "a program which can retrieve a source resource via HTTP, allow
editing of this resource, and then save/publish this resource to a server
using HTTP." A "client" is "a program which issues HTTP requests and accepts
responses."

  1. Source Retrieval. The source of any given resource should be
     retrievable via HTTP.

     There are many cases where the source resource stored on a server does
     not correspond to the entity transmitted in response to an HTTP GET.
     Current known cases are server side include directives, and Standard
     Generalized Markup Language (SGML) source entities which are converted
     on the fly to HyperText Markup Language (HTML) [2] output entities.
     There are many possible cases, such as automatic conversion of bitmap
     images into several variant bitmap media types (e.g. GIF, JPEG), and
     automatic conversion of an application's native media type into HTML.
     As an example of this last case, a word processor could store its
     native media type on a server which automatically converts it to HTML.
     A GET of this resource would retrieve the HTML. Retrieving the source
     of this resource would result in the transmission of the native word
     processor entity.

     This requirement should be met by a general mechanism which can handle
     both the "single-step" source processing described above, where the
     source is converted into the transmission entity via a single
     conversion step, as well as "multi-step" source processing, where there
     are one or more intermediary processing steps and outputs. An example
     of multi-step source processing is the relationship between an
     executable binary image, its object files, and its source language
     files. It should be noted that the relationship between source resource
     and transmission entity could be expressed using the functionality
     described below in "Relationships."

  2. Relationships. Via HTTP, it should be possible to create, query, list,
     and delete typed relationships between resources of any media type.

     A hypertext link is a relationship between resources which is browsable
     using a hypertext style point-and-click user interface. Relationships,
     whether they are browsable hypertext links, or simply a means of
     capturing a interrelation between entities, have many purposes.
     Relationships can support pushbutton printing of a multi-resource
     document in a prescribed order, jumping to the access control page for
     a resource, and quick browsing of related information, such as a table
     of contents, an index, a glossary, help pages, etc. While relationship
     support is provided by the HTML "LINK" element, this is limited only to
     HTML entities, and does not support bitmap image types, and other
     non-HTML media types.
     AOLpress from America Online [1] currently "allows pages to add toolbar
     buttons on the fly using the HTML 3.2 <LINK REL....> tag. For example,
     your page can add toolbar buttons that link to a home page, table of
     contents, index, glossary, copyright page, next page, previous page,
     help page, higher level page, or a bookmark in the document."

  3. Write Locks. It should be possible, via HTTP, to restrict modification
     of a resource to a specific person, or list of persons. It should be
     possible to set single or multi-person write locks with a single
     action.
  4. No-Modify Locks. It should be possible, via HTTP, to indicate to the
     HTTP server that the contents of a resource should not be modified
     until the read lock is released. It should be possible to assign a
     no-modify lock to a single person or a list of persons with a single
     action.
  5. Read Locks. It should be possible, via HTTP, to restrict the ability to
     read a resource to a specific person, or list of persons. It should be
     possible to set single or multi-person read locks with a single action.
  6. Partial-Resource Locking. It should be possible to take out a write or
     a read lock on subsections of a resource.
  7. Lock Query. It should be possible to query for whether a given resource
     has any active modification restrictions, and if so, who currently has
     modification permission.
  8. Independence of locks. It should be possible to lock a resource without
     re-reading the resource, and without committing to editing the
     resource.
  9. Multi-Resource Locking. It should be possible to take out a lock on
     multiple resources in the same action, and this locking operation must
     be atomic across these resources.

     At present, HTTP provides limited support for preventing two or more
     people from overwriting each other's modifications when they save to a
     given URL. Furthermore, there is no way for people to discover if
     someone else is currently making modifications to a resource. This is
     known as the "lost update problem," or the "overwrite problem." Since
     there can be significant cost associated with discovering and repairing
     lost modifications, preventing this problem is crucial for supporting
     distributed authoring. A "write" lock ensures that only one person (or
     list of persons) may modify an resource, preventing overwrites.
     Furthermore, locking support is also a key component of many versioning
     schemes, a desirable capability for distributed authoring.

     An author may wish to lock an entire web of resources even though they
     are editing just a single resource, to keep the other resources from
     changing. In this way, an author can ensure that if a local hypertext
     web is consistent in their distributed authoring tool, it will then be
     consistent when they write it to the server. Because of this, it should
     be possible to take out a lock without also causing transmission of the
     contents of a resource. Since it should not be assumed that because an
     resource is locked, that it will necessarily be modified, and since
     many people may wish to have simultaneous guarantees that a resource
     will not be modified, but still not want to modify the resource
     themselves, it is desirable to have a "no-modify" lock capability. A
     no-modify lock, by being less restrictive, provides better support than
     a write lock for providing a guarantee that a resource will not be
     modified. Since it is prudent to frequently save working drafts of a
     resource, yet undesirable in many cases for working drafts to be
     readable by persons other than the author, a "read" lock allows a
     resource to have only a limited set of people (perhaps only the author)
     who can read the current state of the resource. In summary, a no-modify
     lock states that the resource is guaranteed not to change for the
     duration of the lock. A read lock guarantees that only the owner of the
     lock may read the resource (but there is no guarantee it will not be
     modified). A write lock states that a resource is guaranteed not to
     change only if the owner of the lock does not change it, and only the
     owner of the lock may change it.

     It is often necessary to guarantee that a lock or unlock operation
     occurs at the same time across multiple resources, a feature which is
     supported by the multiple-resource locking requirement. This is useful
     for preventing a collision between two people trying to establish locks
     on the same set of resources, since with multi-resource locking, one of
     the two people will get a lock. If this same multiple-resource locking
     scenario was repeated by using atomic lock operations iterated across
     the resources, the result would be a splitting of the locks between the
     two people, based on resource ordering and race conditions.

     Partial resource locking provides support for collaborative editing
     applications, where multiple users may be editing the same resource
     simultaneously. Partial resource locking also allows multiple people to
     simultaneously work on a database type resource.

     When more than one types of lock is requested on a resource, it results
     in either multiple, complimentary locks, or lock collisions. An
     implementation of the locking functionality must specify how the locks
     interact, noting which combinations result in collisions, and which
     combinations are complimentary.

 10. Partial Write. After editing a resource, it should be possible, via
     HTTP, to only write the changes to a resource, rather than
     retransmitting the entire resource.

     During distributed editing which occurs over wide geographic
     separations and/or over low bandwidth connections, it would be
     extremely inefficient (and frustrating) to rewrite a large resource
     after minor changes, such as a one-character spelling correction.
     Ideally, support will be provided for transmitting "insert" (e.g., add
     this sentence in the middle of a document) and "delete" (e.g. remove
     this paragraph from the middle of a document) style updates. These
     changes could be as small as a byte, or arbitrarily large (perhaps even
     larger than the original resource), and could result in an arbitrarily
     large growth or shrinkage of the resource. Support for partial resource
     updates will make small edits more efficient, and allow distributed
     authoring tools to scale up for editing of large documents.

 11. Attributes. Via HTTP, it should be possible to create, modify, query,
     read and delete arbitrary attributes on resources of any media type.

     Attributes can be used to define fields such as author, title, subject,
     and organization, on resources of any media type. These attributes have
     many uses, such as supporting searches on attribute contents, and the
     creation of catalog entries as a placeholder for an resource which is
     not available in electronic form, or which will be available later.

 12. List URL Hierarchy Level. A listing of all resources, along with their
     media type, and last modified date, which are located at a specific URL
     [3] hierarchy level in an http URL scheme should be accessible via
     HTTP, so long as this operation is meaningful.

     In [3] it states that, "some URL schemes (such as the ftp, http, and
     file schemes) contain names that can be considered hierarchical."
     Especially for HTTP servers which directly map all or part of their URL
     name space into a filesystem, it is very useful to get a listing of all
     resources located at a particular hierarchy level. This functionality
     supports "Save As..." dialog boxes, which provide a listing of the
     resources at a current hierarchy level, and allow navigation through
     the hierarchy. It also supports the creation of graphical
     visualizations (typically as a network) of the hypertext structure
     among the resources at a hierarchy level, or set of levels. It also
     supports a tree visualization of the resources and their hierarchy
     levels.

     There are many instances where there is not a strong correlation
     between a URL hierarchy level and the notion of a container. One
     example is a server in which the URL hierarchy level maps to a
     computational process which performs some resolution on the name. In
     this case, the contents of the URL hierarchy level can vary depending
     on the input to the computation, and the number of resources accessible
     via the computation can be very large. It does not make sense to
     implement a directory feature for such a namespace. However, the
     utility of listing the contents of those URL hierarchy levels which do
     correspond to containers, such as the large number of HTTP servers
     which map their namespace to a filesystem, argue for the inclusion of
     this capability, despite not being meaningful in all cases. If listing
     the contents of a URL hierarchy level does not makes sense for a
     particular URL, then a "405 Method Not Allowed" status code could be
     issued.

     AOLpress from America Online currently supports "Save As..." dialog
     boxes, and graphical network visualization of a portion of a site's
     hypertext structure, which they term a "mini-web."
     FrontPage from Microsoft [5] also currently supports a graphical
     network visualization and additionally supports a tree visualization of
     a portion of a site's structure.

 13. Make URL Hierarchy Level. Via HTTP, it should be possible to create a
     new URL hierarchy level in an http URL scheme.

     The ability to create containers to hold related resources supports
     management of a name space by packaging its members into small, related
     clusters. The utility of this capability is demonstrated by the broad
     implementation of directories in recent operating systems. The ability
     to create a URL hierarchy level also supports the creation of "Save
     As..." dialog boxes with "New Level/Folder/Directory" capability,
     common in many applications.
     AOLpress from America Online, currently supports this capability
     through their "Save As..." dialog box, and their custom MKDIR method.

 14. Copy. Via HTTP, it should be possible to make a byte-for-byte duplicate
     of a resource without a client loading, then resaving the resource.
     This copy should leave an audit trail.

     There are many reasons why a resource might need to be duplicated, such
     as change of ownership, a precursor to major modifications, or to make
     a backup. In combination with delete functionality, copy can be used to
     implement rename and move capabilities, by performing a copy to a new
     name, and a delete of the old name. Due to network costs associated
     with loading and saving a resource, it is far preferable to have a
     server perform a resource copy than a client. If a copied resource
     records which resource it is a copy of, then it would be possible for a
     cache to avoid loading the copied resource if it already locally stores
     the original.
     Note that in the case of CGI scripts, or other executable content, the
     intent of the copy operation is to make a byte-for-byte copy of the
     source of the resource, rather than require the server to guarantee
     that the resource will execute in the same manner in the new location
     as it did in the old location. It is the responsibility of the user to
     ensure the resource behaves properly in its new location.

 15. Move/Rename. Via HTTP, it should be possible to change the URL of a
     resource without a client loading, then resaving the resource under a
     different name.

     It is often necessary to change the name of a resource, for example due
     to adoption of a new naming convention, or if a typing error was made
     entering the name originally. Due to network costs, it is undesirable
     to perform this operation by loading and resaving the resource under
     the new name, followed by a delete of the old resource. Similarly, a
     single rename operation is more efficient than a copy followed by a
     delete operation. Ideally an HTTP server should record the move
     operation, and issue a "301 Moved Permanently" status code for requests
     on the old URL. A move operation, if implemented with attribute
     support, should also preserve most attributes across a move. Note that
     moving a resource is considered the same function as renaming an
     resource.

3. Acknowledgments

My understanding of these issues has emerged as the result of much
thoughtful discussion, email, and assistance by many people, who deserve
recognition for their effort.

Martin Cagan, Continuus Software, Marty_Cagan@continuus.com
Dan Connolly, World Wide Web Consortium, connolly@w3.org
David Durand, Boston University, dgd@cs.bu.edu
Ron Fein, Microsoft, ronfe@microsoft.com
David Fiander, Mortice Kern Systems, davidf@mks.com
Roy Fielding, U.C. Irvine, fielding@ics.uci.edu
Yaron Goland, Microsoft, yarong@microsoft.com
Phill Hallam-Baker, MIT, hallam@ai.mit.edu
Dennis Hamilton, Xerox PARC, hamilton@parc.xerox.com
Andre van der Hoek, University of Colorado, Boulder,
andre@bigtime.cs.colorado.edu
Gail Kaiser, Columbia University, kaiser@cs.columbia.edu
Rohit Khare, World Wide Web Consortium, khare@w3.org
Dave Long, America Online, dave@sb.aol.com
Henrik Frystyk Nielsen, World Wide Web Consortium, frystyk@w3.org
Ora Lassila, Nokia Research Center, ora.lassila@research.nokia.com
Larry Masinter, Xerox PARC, masinter@parc.xerox.com
Murray Maloney, SoftQuad, murray@sq.com
Jim Miller, World Wide Web Consortium, jmiller@w3.org
Andrew Schulert, Microsoft, andyschu@microsoft.com
Christopher Seiwald, Perforce Software, seiwald@perforce.com
Judith Slein, Xerox, slein@wrc.xeroc.com
Richard Taylor, U.C. Irvine, taylor@ics.uci.edu
Robert Thau, MIT, rst@ai.mit.edu
Fabio Vitali, University of Bologna, Italy, fabio@dm.unibo.it

4. References

[1] America Online, "AOL Web Tools -- AOLpress 1.2 Features." WWW page.
http://www.aolpress.com/press/1.2features.html.

[2] T. Berners-Lee, D. Connolly. "HyperText Markup Language Specification -
2.0." RFC 1866, MIT/LCS, November 1995.

[3] T. Berners-Lee, L. Masinter, M. McCahill. "Uniform Resource Locators
(URL)." RFC 1738, CERN, Xerox PARC, University of Minnesota, December 1994.

[4] R. Fielding, J. Gettys, J. C. Mogul, H. Frystyk, and T. Berners-Lee.
"Hypertext Transfer Protocol -- HTTP/1.1." RFC XXXX, U.C. Irvine, DEC,
MIT/LCS, August 1996.

[5] Microsoft. "Microsoft FrontPage for Windows Data Sheet." WWW page.
http://www.microsoft.com/msoffice/frontpage/productinfo/brochure/default.htm.

Author's Address

E. James Whitehead, Jr.
Department of Information and Computer Science
University of California
Irvine, CA 92697-3425

Phone: 714-824-4121
Fax: 714-824-4056
EMail: ejw@ics.uci.edu

    

Received on Friday, 27 September 1996 18:14:19 UTC