Re: [Bug 85] clarification of live property behaviour vs namespace ops needed

Hi,

before going to work on this issue, I just re-read the summary I wrote 
some months ago. I think it contains a correct analysis of the 
situation, thus I will use it as a basis for the changes. People 
interested in this issue may want to re-read the HTML version at 
<http://greenbytes.de/tech/webdav/draft-reschke-webdav-namespace-vs-properties-latest.html>, 
or the TXT version below.

Feedback appreciated,

Julian

----------------

        Behaviour of WebDAV properties under namespace operations


Abstract

    This document discusses the impact of WebDAV namespace operations on
    the behaviour of live properties defined in HTTP and WebDAV.

1.  Detecting changes in HTTP

    HTTP ([RFC2616]) defines a set of response header fields that can be
    used to detect changes, namely "ETag" (Section 14.19) and "Last-
    Modified" (Section 14.29).  User agents can use request header fields
    to make method invocations conditional, such as

    o  "If-Modified-Since" (Section 14.25) and "If-None-Match" (Section
       14.26) to GET the representation of the resource if and only if it
       differs from what the user agent already obtained, and

    o  "If-Unmodified-Since" (Section 14.28) and "If-Match" (Section
       14.24) to overwrite the representation using PUT if and only if it
       didn't change since the last GET request (thus avoiding
       overlapping updates).

    Note that HTTP defines the behaviour of these headers in terms of
    "variants" (i.e. the different representations that may be returned
    for a single resource; see Section 1.3).  Furthermore, although HTTP
    distinguishes between the term "URI" (identifier) and "resource" (the
    object identified by the URI), the difference has little impact as
    the HTTP specification does not define any namespace operations that
    would change the mapping between URIs and resources.  Thus, generic
    clients will rely on consistent behaviour of "Last-Modified" and
    "ETag" on a per-URI basis even in the presence of namespace
    operations.

1.1  Example: GET only if unchanged

    >> Request (getting the content initially)

    GET /index.html HTTP/1.1
    Host: example.org

    >> Response

    HTTP/1.1 200 OK
    Content-Type: text/html; charset="utf-8"
    Content-Length: xxxx
    Last-Modified: Sun, 20 Mar 2005 12:45:26 GMT

    ...body...

    The user agent stores the response headers along with the content.
    When it needs to update the content (for instance the user initiates
    a refresh of the browser window), it uses that information to make
    the request conditional.

    >> Request (refreshing the content)

    GET /index.html HTTP/1.1
    Host: example.org
    If-Unmodified-Since: Sun, 20 Mar 2005 12:45:26 GMT

    >> Response

    HTTP/1.1 304 Not Modified

    Thus, if the content did not change, the user agent avoids getting
    the same content again.

1.2  Example: PUT only if unchanged

    >> Request (getting the content initially)

    GET /index.html HTTP/1.1
    Host: example.org

    >> Response

    HTTP/1.1 200 OK
    Content-Type: text/html; charset="utf-8"
    Content-Length: xxxx
    Last-Modified: Sun, 20 Mar 2005 12:45:26 GMT
    ETag: "1"

    ...body...

    >> Request (writing back the content)

    PUT /index.html HTTP/1.1
    Host: example.org
    If-Match: "1"

    >> Response

    HTTP/1.1 200 OK

    However, would the content have changed between the two requests, the
    response would be:

    >> Response

    HTTP/1.1 412 Precondition Failed


1.3  Requirements for 'Last-Modified' and 'ETag'

    Below is a list of requirements for the behaviour for 'Last-Modified'
    and 'ETag':

    R1: The "Last-Modified" date returned for a specific variant of a
        resource must change whenever the content changes.

    R2: Whenever it changes, the "Last-Modified" date must increment.

    R3: The entity tag ("ETag") returned for a specific variant of a
        resource must change whenever the content changes.


2.  Implications for WebDAV namespace operations

    The requirements above seem to be straightforward to implement, but
    things get tricky as soon as namespace operations such as COPY
    ([RFC2518], Section 8.8) and MOVE (Section 8.9) are introduced.

    For example, consider two resources identified by "index.html" and
    "index.html.bak" with last modified dates of "12:00:00 GMT" and "11:
    50:00 GMT" respectively.

    A client may have retrieved the content for "index.html", remembering
    the first timestamp:

    >> Request (getting the content initially)

    GET /index.html HTTP/1.1
    Host: example.org

    >> Response

    HTTP/1.1 200 OK
    Content-Type: text/html; charset="utf-8"
    Content-Length: xxxx
    Last-Modified: Sun, 20 Mar 2005 12:00:00 GMT

    ...body...

    Later, another user decides to restore the backup, using a WebDAV
    MOVE request.

    >> Request (getting the content initially)

    MOVE /index.html.bak HTTP/1.1
    Host: example.org
    Destination: http://example.org/index.html

    >> Response

    HTTP/1.1 204 OK

    Finally, the first user agent decides to refresh the content for
    "index.html".  What value for "Last-Modified" should be returned?

    o  Moving the timestamp (setting it to "11:50:00 GMT") will cause it
       to move back in time, causing the client to assume that the
       content did not change.

    o  Not modifying the timestamp (leaving it "12:00:00 GMT") will cause
       the client to assume the content did not change as well.

    o  Thus, to avoid lost refreshes, the server will have to assign a
       new timestamp which differs from both timestamps and actually is
       newer than both.

    The situation for "ETag" is only slightly different; the entity tag
    needs to be unique across all variants ever served for the same HTTP
    URL (the only difference is that it doesn't have any inherent order
    that the conditional request headers would check).  Thus, if a server
    can guarantee that no entity tag ever repeats for any URL within it's
    namespace, namespace operations do not require any post-processing
    (otherwise, the same considerations as for "Last-Modified" apply).

    [[anchor4: Mention the impact of depth=infinity namespace operations
    --reschke]]

3.  'Last-Modified' vs BIND

    [draft-ietf-webdav-bind] defines a set of new namespace operations
    (BIND, UNBIND, REBIND).  It's easy to see that for REBIND, the same
    considerations will apply as for MOVE, and that UNBIND will behave as
    DELETE.  But what about BIND?

    BIND creates a new URL mapping for a given resource.  A server
    basically has two choices for implementing the "Last-Modified" for
    resources that support multiple bindings:

    1.  Store the time stamp with the resource.  In this case, "Last-
        Modified" will be the same regardless which URL a GET/HEAD/
        PROPFIND request is applied to.  However, the date that is
        returned must satisfy the requirements defined in Section 1.3 for
        each of the URLs mapped to it.  In practice this means that using
        BIND to map a URL that has been in use before may cause the
        "Last-Modified" date to be incremented (for all URLs through
        which the resource is accessible).

    2.  Alternatively, a server may choose to store the time stamp on a
        per-URL basis.  This, however, will have the effect that
        different time stamps are returned although the underlying
        resource is the same (per BIND's definition).

    Note that unless a server implements namespace-wide unique entity
    tags, the same situation will apply to entity tags as well.

4.  Summary

    Client implementors will have to expect that HTTP response headers
    will vary for different URLs even though the underlying resource is
    the same.  On the other hand, they will also have to expect namespace
    operations such as MOVE, COPY, BIND or REBIND will affect time stamps
    and entity tags in a possibly surprising way.  It's impossible to
    predict, because these headers are defined by HTTP, not per WebDAV or
    BIND.

    Looking at the properties defined in Section 13 of [RFC2518], only
    some of them are inherited from HTTP and thus will possibly behave as
    described above:

    +---------------------------------+---------------------------------+
    | property                        | behaviour                       |
    +---------------------------------+---------------------------------+
    | creationdate                    | per resource                    |
    | displayname                     | per resource (your mileage may  |
    |                                 | vary for some broken            |
    |                                 | implementations out there)      |
    | creationdate                    | per resource                    |
    | getcontentlanguage              | potentially per URL as per HTTP |
    | getcontentlentgth               | potentially per URL as per HTTP |
    | getcontenttype                  | potentially per URL as per HTTP |
    | getetag                         | potentially per URL as per HTTP |
    | getlastmodified                 | potentially per URL as per HTTP |
    | lockdiscovery                   | per resource                    |
    | resourcetype                    | per resource                    |
    | supportedlock                   | per resource                    |
    | source                          | per resource                    |
    +---------------------------------+---------------------------------+


5.  References

    [RFC2518]  Goland, Y., Whitehead, E., Faizi, A., Carter, S., and D.
               Jensen, "HTTP Extensions for Distributed Authoring --
               WEBDAV", RFC 2518, February 1999.

    [RFC2616]  Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
               Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
               Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.

    [draft-ietf-webdav-bind]
               Clemm, G., Crawford, J., Reschke, J., and J. Whitehead,
               "Binding Extensions to Web Distributed Authoring and
               Versioning (WebDAV)", draft-ietf-webdav-bind-12 (work in
               progress), July 2005, <http://greenbytes.de/tech/webdav/
               draft-ietf-webdav-bind-12.html>.


Author's Address

    Julian F. Reschke
    greenbytes GmbH
    Hafenweg 16
    Muenster, NW  48155
    Germany

    Phone: +49 251 2807760
    Fax:   +49 251 2807761
    Email: julian.reschke@greenbytes.de
    URI:   http://greenbytes.de/tech/webdav/

Appendix A.  FAQ

A.1  Why is it so hard to always supply robust entity tags?

    Example #1: a Java-based server maps filesystem objects to HTTP
    resources, and is stuck with what java.io.File supports (which only
    allows access to a very limited subset of the operating system's file
    information, see
    <http://java.sun.com/j2se/1.5.0/docs/api/java/io/File.html>).  If the
    server doesn't fully control the filesystem, and unless it's prepared
    to store metadata out-of-band (outside the filesystem), it will have
    to compute entity tags based on file information such as the last-
    modified date and the length.  The only robust alternative would be
    to compute a hash of the actual file's contents, but usually this is
    too expensive.

    Example #2: a module implementing WebDAV is just an add-on to the
    generic HTTP handler in a server (i.e., mod_dav inside Apache httpd
    server), and the server doesn't have any information except the one
    obtained from the underlying store (in this case the filesystem).
    Even if the server indeed has full access to the operating system's
    information, it may still not be able to use the file's inode
    information, for instance  because it's a network drive.

A.2  What's the story about weak entity tags?

    HTTP distinguishes between "weak" and "strong" entity tags (see
    [RFC2616], Section 3.11).  Only strong entity tags can be used in
    authoring scenarios such as the one described in Section 1.2.
    However, if an entity tag has been computed based on "last-modified"
    information, it only becomes a "strong" entity tag after a certain
    interval of non-activity on a resource.  Thus, servers may return a
    weak entity tag as result of a PUT operations, and only later
    "promote" it to a strong entity tag.

    Requiring servers to always return strong entity tags in the first
    place _will_ render Apache/mod_dav non-conformant.

Received on Sunday, 11 December 2005 10:56:31 UTC