- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Sun, 11 Dec 2005 11:54:54 +0100
- To: WebDAV <w3c-dist-auth@w3.org>
Hi, before going to work on this issue, I just re-read the summary I wrote some months ago. I think it contains a correct analysis of the situation, thus I will use it as a basis for the changes. People interested in this issue may want to re-read the HTML version at <http://greenbytes.de/tech/webdav/draft-reschke-webdav-namespace-vs-properties-latest.html>, or the TXT version below. Feedback appreciated, Julian ---------------- Behaviour of WebDAV properties under namespace operations Abstract This document discusses the impact of WebDAV namespace operations on the behaviour of live properties defined in HTTP and WebDAV. 1. Detecting changes in HTTP HTTP ([RFC2616]) defines a set of response header fields that can be used to detect changes, namely "ETag" (Section 14.19) and "Last- Modified" (Section 14.29). User agents can use request header fields to make method invocations conditional, such as o "If-Modified-Since" (Section 14.25) and "If-None-Match" (Section 14.26) to GET the representation of the resource if and only if it differs from what the user agent already obtained, and o "If-Unmodified-Since" (Section 14.28) and "If-Match" (Section 14.24) to overwrite the representation using PUT if and only if it didn't change since the last GET request (thus avoiding overlapping updates). Note that HTTP defines the behaviour of these headers in terms of "variants" (i.e. the different representations that may be returned for a single resource; see Section 1.3). Furthermore, although HTTP distinguishes between the term "URI" (identifier) and "resource" (the object identified by the URI), the difference has little impact as the HTTP specification does not define any namespace operations that would change the mapping between URIs and resources. Thus, generic clients will rely on consistent behaviour of "Last-Modified" and "ETag" on a per-URI basis even in the presence of namespace operations. 1.1 Example: GET only if unchanged >> Request (getting the content initially) GET /index.html HTTP/1.1 Host: example.org >> Response HTTP/1.1 200 OK Content-Type: text/html; charset="utf-8" Content-Length: xxxx Last-Modified: Sun, 20 Mar 2005 12:45:26 GMT ...body... The user agent stores the response headers along with the content. When it needs to update the content (for instance the user initiates a refresh of the browser window), it uses that information to make the request conditional. >> Request (refreshing the content) GET /index.html HTTP/1.1 Host: example.org If-Unmodified-Since: Sun, 20 Mar 2005 12:45:26 GMT >> Response HTTP/1.1 304 Not Modified Thus, if the content did not change, the user agent avoids getting the same content again. 1.2 Example: PUT only if unchanged >> Request (getting the content initially) GET /index.html HTTP/1.1 Host: example.org >> Response HTTP/1.1 200 OK Content-Type: text/html; charset="utf-8" Content-Length: xxxx Last-Modified: Sun, 20 Mar 2005 12:45:26 GMT ETag: "1" ...body... >> Request (writing back the content) PUT /index.html HTTP/1.1 Host: example.org If-Match: "1" >> Response HTTP/1.1 200 OK However, would the content have changed between the two requests, the response would be: >> Response HTTP/1.1 412 Precondition Failed 1.3 Requirements for 'Last-Modified' and 'ETag' Below is a list of requirements for the behaviour for 'Last-Modified' and 'ETag': R1: The "Last-Modified" date returned for a specific variant of a resource must change whenever the content changes. R2: Whenever it changes, the "Last-Modified" date must increment. R3: The entity tag ("ETag") returned for a specific variant of a resource must change whenever the content changes. 2. Implications for WebDAV namespace operations The requirements above seem to be straightforward to implement, but things get tricky as soon as namespace operations such as COPY ([RFC2518], Section 8.8) and MOVE (Section 8.9) are introduced. For example, consider two resources identified by "index.html" and "index.html.bak" with last modified dates of "12:00:00 GMT" and "11: 50:00 GMT" respectively. A client may have retrieved the content for "index.html", remembering the first timestamp: >> Request (getting the content initially) GET /index.html HTTP/1.1 Host: example.org >> Response HTTP/1.1 200 OK Content-Type: text/html; charset="utf-8" Content-Length: xxxx Last-Modified: Sun, 20 Mar 2005 12:00:00 GMT ...body... Later, another user decides to restore the backup, using a WebDAV MOVE request. >> Request (getting the content initially) MOVE /index.html.bak HTTP/1.1 Host: example.org Destination: http://example.org/index.html >> Response HTTP/1.1 204 OK Finally, the first user agent decides to refresh the content for "index.html". What value for "Last-Modified" should be returned? o Moving the timestamp (setting it to "11:50:00 GMT") will cause it to move back in time, causing the client to assume that the content did not change. o Not modifying the timestamp (leaving it "12:00:00 GMT") will cause the client to assume the content did not change as well. o Thus, to avoid lost refreshes, the server will have to assign a new timestamp which differs from both timestamps and actually is newer than both. The situation for "ETag" is only slightly different; the entity tag needs to be unique across all variants ever served for the same HTTP URL (the only difference is that it doesn't have any inherent order that the conditional request headers would check). Thus, if a server can guarantee that no entity tag ever repeats for any URL within it's namespace, namespace operations do not require any post-processing (otherwise, the same considerations as for "Last-Modified" apply). [[anchor4: Mention the impact of depth=infinity namespace operations --reschke]] 3. 'Last-Modified' vs BIND [draft-ietf-webdav-bind] defines a set of new namespace operations (BIND, UNBIND, REBIND). It's easy to see that for REBIND, the same considerations will apply as for MOVE, and that UNBIND will behave as DELETE. But what about BIND? BIND creates a new URL mapping for a given resource. A server basically has two choices for implementing the "Last-Modified" for resources that support multiple bindings: 1. Store the time stamp with the resource. In this case, "Last- Modified" will be the same regardless which URL a GET/HEAD/ PROPFIND request is applied to. However, the date that is returned must satisfy the requirements defined in Section 1.3 for each of the URLs mapped to it. In practice this means that using BIND to map a URL that has been in use before may cause the "Last-Modified" date to be incremented (for all URLs through which the resource is accessible). 2. Alternatively, a server may choose to store the time stamp on a per-URL basis. This, however, will have the effect that different time stamps are returned although the underlying resource is the same (per BIND's definition). Note that unless a server implements namespace-wide unique entity tags, the same situation will apply to entity tags as well. 4. Summary Client implementors will have to expect that HTTP response headers will vary for different URLs even though the underlying resource is the same. On the other hand, they will also have to expect namespace operations such as MOVE, COPY, BIND or REBIND will affect time stamps and entity tags in a possibly surprising way. It's impossible to predict, because these headers are defined by HTTP, not per WebDAV or BIND. Looking at the properties defined in Section 13 of [RFC2518], only some of them are inherited from HTTP and thus will possibly behave as described above: +---------------------------------+---------------------------------+ | property | behaviour | +---------------------------------+---------------------------------+ | creationdate | per resource | | displayname | per resource (your mileage may | | | vary for some broken | | | implementations out there) | | creationdate | per resource | | getcontentlanguage | potentially per URL as per HTTP | | getcontentlentgth | potentially per URL as per HTTP | | getcontenttype | potentially per URL as per HTTP | | getetag | potentially per URL as per HTTP | | getlastmodified | potentially per URL as per HTTP | | lockdiscovery | per resource | | resourcetype | per resource | | supportedlock | per resource | | source | per resource | +---------------------------------+---------------------------------+ 5. References [RFC2518] Goland, Y., Whitehead, E., Faizi, A., Carter, S., and D. Jensen, "HTTP Extensions for Distributed Authoring -- WEBDAV", RFC 2518, February 1999. [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. [draft-ietf-webdav-bind] Clemm, G., Crawford, J., Reschke, J., and J. Whitehead, "Binding Extensions to Web Distributed Authoring and Versioning (WebDAV)", draft-ietf-webdav-bind-12 (work in progress), July 2005, <http://greenbytes.de/tech/webdav/ draft-ietf-webdav-bind-12.html>. Author's Address Julian F. Reschke greenbytes GmbH Hafenweg 16 Muenster, NW 48155 Germany Phone: +49 251 2807760 Fax: +49 251 2807761 Email: julian.reschke@greenbytes.de URI: http://greenbytes.de/tech/webdav/ Appendix A. FAQ A.1 Why is it so hard to always supply robust entity tags? Example #1: a Java-based server maps filesystem objects to HTTP resources, and is stuck with what java.io.File supports (which only allows access to a very limited subset of the operating system's file information, see <http://java.sun.com/j2se/1.5.0/docs/api/java/io/File.html>). If the server doesn't fully control the filesystem, and unless it's prepared to store metadata out-of-band (outside the filesystem), it will have to compute entity tags based on file information such as the last- modified date and the length. The only robust alternative would be to compute a hash of the actual file's contents, but usually this is too expensive. Example #2: a module implementing WebDAV is just an add-on to the generic HTTP handler in a server (i.e., mod_dav inside Apache httpd server), and the server doesn't have any information except the one obtained from the underlying store (in this case the filesystem). Even if the server indeed has full access to the operating system's information, it may still not be able to use the file's inode information, for instance because it's a network drive. A.2 What's the story about weak entity tags? HTTP distinguishes between "weak" and "strong" entity tags (see [RFC2616], Section 3.11). Only strong entity tags can be used in authoring scenarios such as the one described in Section 1.2. However, if an entity tag has been computed based on "last-modified" information, it only becomes a "strong" entity tag after a certain interval of non-activity on a resource. Thus, servers may return a weak entity tag as result of a PUT operations, and only later "promote" it to a strong entity tag. Requiring servers to always return strong entity tags in the first place _will_ render Apache/mod_dav non-conformant.
Received on Sunday, 11 December 2005 10:56:31 UTC