WEBDAV Observations on Disconnected Operation, Distributed/Replic ated Systems, and Caching
I would like to apologize for having taken so long to begin this series
of postings. The task of putting together my views on how I feel the
draft should be changed has taken longer than I expected. My notes are
at around 13 pages and growing. The purpose of this series of postings
is to stimulate discussion. The ultimate goal, of course, is to get
consensus around what the final changes will be. This first post covers
the areas of Disconnected Operation, Distributed/Replicated Systems, and
High latency connections, from a 28.8 modem to disconnected operation,
are the norm on the Internet. From laptops on airplanes to home
computer, users are constantly put into situations where they do not
have high bandwidth connections to their servers. As such I propose a
litmus test to help the group determine if it has dealt properly with
the high latency and disconnected scenarios.
If a principal using DAV can disconnect from a network, perform useful
work, connect back to the network and using only DAV mechanisms
synchronize their state with multiple independent systems on the
network, then we have properly dealt with the high latency/disconnected
Distributed Vs Replicated Documents
Distributed System - A system where a group of related resources are
potentially available on multiple servers.
Replicated System - A system where a resource is available at multiple
locations, potentially on different servers.
I propose that DAV be required to deal with distributed systems and
furthermore that DAV's design make no assumptions regarding the ability
of the servers within a distributed system to communicate with each
other. Thus if a principal obtains the history of a versioned document
there can be no guarantee that the history is complete. Rather the best
that can be done is to define the history response format such that it
can indicate that additional history information may be available on
other servers. This can result in different servers having documents
with incomplete, inaccurate, or even conflicting histories, links,
attributes, etc. In short, any data that requires cooperation between
multiple servers should be seen as suspect.
I would additionally propose that unless a server has a mechanism that
allows it to reasonably guarantee that the cross server data it is
presenting is consistent then all 200 responses should be replaced with
something like a 203 response. I say "something like" because the use of
203 for this purpose actually extends the definition of 203 to include
the body, not just the headers. As such it may be appropriate to define
a new response number.
I further propose that DAV only deal with replicated systems that employ
a golden copy system. A golden copy is a copy of a resource that is
identified as the reference copy. All other copies are referred to as
shadow copies. The location of the golden copy of a resource may alter
with time but it is required that the involved servers be able to handle
transfer of the golden copy without any interaction with the client. Any
requests that would alter the state of a resource will be redirected to
the golden copy. So if a principal tries to lock a shadow the request
will be redirected to the golden copy.
Unless specified otherwise, the response from a DAV method is not
cacheable. Caches exist to store entities that will remain unchanged for
a reasonable time and DAV exists in order to change these entities.
However that does not relieve the group from considering caching in the
HTTP 1.1 caches will purge any response from memory if they see a method
executed on the resource that alters the state of the resource. While I
could not find an explicit statement to this end, I suspect that most
caches will treat any unknown method as a non-cacheable response that
requires purging their current responses. Thus DAV methods must assign
Request-URIs in such a way as to support this purge mechanism. So, for
example, the Request-URI of the COPY method must be the destination of
the copy, not the source. That way the cache will purge responses for
the destination, which will be changing, rather than the source, which
will not. The same logic applies to MOVE so long as the Content-Location
header, set to the source, is included in the response, thus causing the
source's records to also be purged.