- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Sun, 11 Dec 2005 11:54:54 +0100
- To: WebDAV <w3c-dist-auth@w3.org>
Hi,
before going to work on this issue, I just re-read the summary I wrote
some months ago. I think it contains a correct analysis of the
situation, thus I will use it as a basis for the changes. People
interested in this issue may want to re-read the HTML version at
<http://greenbytes.de/tech/webdav/draft-reschke-webdav-namespace-vs-properties-latest.html>,
or the TXT version below.
Feedback appreciated,
Julian
----------------
Behaviour of WebDAV properties under namespace operations
Abstract
This document discusses the impact of WebDAV namespace operations on
the behaviour of live properties defined in HTTP and WebDAV.
1. Detecting changes in HTTP
HTTP ([RFC2616]) defines a set of response header fields that can be
used to detect changes, namely "ETag" (Section 14.19) and "Last-
Modified" (Section 14.29). User agents can use request header fields
to make method invocations conditional, such as
o "If-Modified-Since" (Section 14.25) and "If-None-Match" (Section
14.26) to GET the representation of the resource if and only if it
differs from what the user agent already obtained, and
o "If-Unmodified-Since" (Section 14.28) and "If-Match" (Section
14.24) to overwrite the representation using PUT if and only if it
didn't change since the last GET request (thus avoiding
overlapping updates).
Note that HTTP defines the behaviour of these headers in terms of
"variants" (i.e. the different representations that may be returned
for a single resource; see Section 1.3). Furthermore, although HTTP
distinguishes between the term "URI" (identifier) and "resource" (the
object identified by the URI), the difference has little impact as
the HTTP specification does not define any namespace operations that
would change the mapping between URIs and resources. Thus, generic
clients will rely on consistent behaviour of "Last-Modified" and
"ETag" on a per-URI basis even in the presence of namespace
operations.
1.1 Example: GET only if unchanged
>> Request (getting the content initially)
GET /index.html HTTP/1.1
Host: example.org
>> Response
HTTP/1.1 200 OK
Content-Type: text/html; charset="utf-8"
Content-Length: xxxx
Last-Modified: Sun, 20 Mar 2005 12:45:26 GMT
...body...
The user agent stores the response headers along with the content.
When it needs to update the content (for instance the user initiates
a refresh of the browser window), it uses that information to make
the request conditional.
>> Request (refreshing the content)
GET /index.html HTTP/1.1
Host: example.org
If-Unmodified-Since: Sun, 20 Mar 2005 12:45:26 GMT
>> Response
HTTP/1.1 304 Not Modified
Thus, if the content did not change, the user agent avoids getting
the same content again.
1.2 Example: PUT only if unchanged
>> Request (getting the content initially)
GET /index.html HTTP/1.1
Host: example.org
>> Response
HTTP/1.1 200 OK
Content-Type: text/html; charset="utf-8"
Content-Length: xxxx
Last-Modified: Sun, 20 Mar 2005 12:45:26 GMT
ETag: "1"
...body...
>> Request (writing back the content)
PUT /index.html HTTP/1.1
Host: example.org
If-Match: "1"
>> Response
HTTP/1.1 200 OK
However, would the content have changed between the two requests, the
response would be:
>> Response
HTTP/1.1 412 Precondition Failed
1.3 Requirements for 'Last-Modified' and 'ETag'
Below is a list of requirements for the behaviour for 'Last-Modified'
and 'ETag':
R1: The "Last-Modified" date returned for a specific variant of a
resource must change whenever the content changes.
R2: Whenever it changes, the "Last-Modified" date must increment.
R3: The entity tag ("ETag") returned for a specific variant of a
resource must change whenever the content changes.
2. Implications for WebDAV namespace operations
The requirements above seem to be straightforward to implement, but
things get tricky as soon as namespace operations such as COPY
([RFC2518], Section 8.8) and MOVE (Section 8.9) are introduced.
For example, consider two resources identified by "index.html" and
"index.html.bak" with last modified dates of "12:00:00 GMT" and "11:
50:00 GMT" respectively.
A client may have retrieved the content for "index.html", remembering
the first timestamp:
>> Request (getting the content initially)
GET /index.html HTTP/1.1
Host: example.org
>> Response
HTTP/1.1 200 OK
Content-Type: text/html; charset="utf-8"
Content-Length: xxxx
Last-Modified: Sun, 20 Mar 2005 12:00:00 GMT
...body...
Later, another user decides to restore the backup, using a WebDAV
MOVE request.
>> Request (getting the content initially)
MOVE /index.html.bak HTTP/1.1
Host: example.org
Destination: http://example.org/index.html
>> Response
HTTP/1.1 204 OK
Finally, the first user agent decides to refresh the content for
"index.html". What value for "Last-Modified" should be returned?
o Moving the timestamp (setting it to "11:50:00 GMT") will cause it
to move back in time, causing the client to assume that the
content did not change.
o Not modifying the timestamp (leaving it "12:00:00 GMT") will cause
the client to assume the content did not change as well.
o Thus, to avoid lost refreshes, the server will have to assign a
new timestamp which differs from both timestamps and actually is
newer than both.
The situation for "ETag" is only slightly different; the entity tag
needs to be unique across all variants ever served for the same HTTP
URL (the only difference is that it doesn't have any inherent order
that the conditional request headers would check). Thus, if a server
can guarantee that no entity tag ever repeats for any URL within it's
namespace, namespace operations do not require any post-processing
(otherwise, the same considerations as for "Last-Modified" apply).
[[anchor4: Mention the impact of depth=infinity namespace operations
--reschke]]
3. 'Last-Modified' vs BIND
[draft-ietf-webdav-bind] defines a set of new namespace operations
(BIND, UNBIND, REBIND). It's easy to see that for REBIND, the same
considerations will apply as for MOVE, and that UNBIND will behave as
DELETE. But what about BIND?
BIND creates a new URL mapping for a given resource. A server
basically has two choices for implementing the "Last-Modified" for
resources that support multiple bindings:
1. Store the time stamp with the resource. In this case, "Last-
Modified" will be the same regardless which URL a GET/HEAD/
PROPFIND request is applied to. However, the date that is
returned must satisfy the requirements defined in Section 1.3 for
each of the URLs mapped to it. In practice this means that using
BIND to map a URL that has been in use before may cause the
"Last-Modified" date to be incremented (for all URLs through
which the resource is accessible).
2. Alternatively, a server may choose to store the time stamp on a
per-URL basis. This, however, will have the effect that
different time stamps are returned although the underlying
resource is the same (per BIND's definition).
Note that unless a server implements namespace-wide unique entity
tags, the same situation will apply to entity tags as well.
4. Summary
Client implementors will have to expect that HTTP response headers
will vary for different URLs even though the underlying resource is
the same. On the other hand, they will also have to expect namespace
operations such as MOVE, COPY, BIND or REBIND will affect time stamps
and entity tags in a possibly surprising way. It's impossible to
predict, because these headers are defined by HTTP, not per WebDAV or
BIND.
Looking at the properties defined in Section 13 of [RFC2518], only
some of them are inherited from HTTP and thus will possibly behave as
described above:
+---------------------------------+---------------------------------+
| property | behaviour |
+---------------------------------+---------------------------------+
| creationdate | per resource |
| displayname | per resource (your mileage may |
| | vary for some broken |
| | implementations out there) |
| creationdate | per resource |
| getcontentlanguage | potentially per URL as per HTTP |
| getcontentlentgth | potentially per URL as per HTTP |
| getcontenttype | potentially per URL as per HTTP |
| getetag | potentially per URL as per HTTP |
| getlastmodified | potentially per URL as per HTTP |
| lockdiscovery | per resource |
| resourcetype | per resource |
| supportedlock | per resource |
| source | per resource |
+---------------------------------+---------------------------------+
5. References
[RFC2518] Goland, Y., Whitehead, E., Faizi, A., Carter, S., and D.
Jensen, "HTTP Extensions for Distributed Authoring --
WEBDAV", RFC 2518, February 1999.
[RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.
[draft-ietf-webdav-bind]
Clemm, G., Crawford, J., Reschke, J., and J. Whitehead,
"Binding Extensions to Web Distributed Authoring and
Versioning (WebDAV)", draft-ietf-webdav-bind-12 (work in
progress), July 2005, <http://greenbytes.de/tech/webdav/
draft-ietf-webdav-bind-12.html>.
Author's Address
Julian F. Reschke
greenbytes GmbH
Hafenweg 16
Muenster, NW 48155
Germany
Phone: +49 251 2807760
Fax: +49 251 2807761
Email: julian.reschke@greenbytes.de
URI: http://greenbytes.de/tech/webdav/
Appendix A. FAQ
A.1 Why is it so hard to always supply robust entity tags?
Example #1: a Java-based server maps filesystem objects to HTTP
resources, and is stuck with what java.io.File supports (which only
allows access to a very limited subset of the operating system's file
information, see
<http://java.sun.com/j2se/1.5.0/docs/api/java/io/File.html>). If the
server doesn't fully control the filesystem, and unless it's prepared
to store metadata out-of-band (outside the filesystem), it will have
to compute entity tags based on file information such as the last-
modified date and the length. The only robust alternative would be
to compute a hash of the actual file's contents, but usually this is
too expensive.
Example #2: a module implementing WebDAV is just an add-on to the
generic HTTP handler in a server (i.e., mod_dav inside Apache httpd
server), and the server doesn't have any information except the one
obtained from the underlying store (in this case the filesystem).
Even if the server indeed has full access to the operating system's
information, it may still not be able to use the file's inode
information, for instance because it's a network drive.
A.2 What's the story about weak entity tags?
HTTP distinguishes between "weak" and "strong" entity tags (see
[RFC2616], Section 3.11). Only strong entity tags can be used in
authoring scenarios such as the one described in Section 1.2.
However, if an entity tag has been computed based on "last-modified"
information, it only becomes a "strong" entity tag after a certain
interval of non-activity on a resource. Thus, servers may return a
weak entity tag as result of a PUT operations, and only later
"promote" it to a strong entity tag.
Requiring servers to always return strong entity tags in the first
place _will_ render Apache/mod_dav non-conformant.
Received on Sunday, 11 December 2005 10:56:31 UTC