Re: Comments on draft-ietf-deltav-versioning-08

From: Geoffrey M. Clemm (geoffrey.clemm@rational.com)
Date: Tue, Sep 19 2000

  • Next message: Geoffrey M. Clemm: "** time change for weekly conference call ** (Friday 11am EST)"

    Date: Tue, 19 Sep 2000 23:41:21 -0400 (EDT)
    Message-Id: <200009200341.XAA16472@tantalum.atria.com>
    From: "Geoffrey M. Clemm" <geoffrey.clemm@rational.com>
    To: ietf-dav-versioning@w3.org
    Subject: Re: Comments on draft-ietf-deltav-versioning-08
    
    
       From: Ross Wetmore <rwetmore@verticalsky.com>
    
         As a preface to the following comments, it should be undertood that I 
       am still in the process of digesting the concepts as formalized by the
       relevant RFCs. While I have all the data and have been lurking on the
       discussion list for a time or have skimmed many of the past discussions
       for interpretations, I have not necessarily recognized all the structure
       or linkages and their implications. Please accept my apologies wherever
       I appear to have overlooked the obvious, and I ask for your patience if
       I am retracing too much covered ground.
    
    We're specifically looking for "fresh eyes", so no apologies necessary!
    
         There are a lot of questions, as opposed to bald-faced corrections and
       it is perhaps better if one of the principals gets a first crack, or
       chance to use the editorial knife to prune any discussions within the
       full group, so I am bouncing this through a limited audience. Please feel
       free to apply whatever collective editorial discretion you wish in
       responding or forwarding parts of this onwards.
    
    It all looks worth passing on to me.
    
       Previous Context (general comment)
    
         I very much appreciate and could not have managed reading the document
       without the meticulous back referencing of earlier definitions, material
       or foundations. The example or context following and explaining each
       assertion was also key. However, I often found myself rereading past
       sections to locate a remembered tag, reference or definition, and
       perhaps iterating several times through previous documents before I had
       the necessary context to understand a point being made.
         The following should not be construed as criticisms or requests, but
       merely possible enhancements that might have helped reduce or increase
       the efficiency of my context switching load.
    
         A quick summary would be useful of the background and current elements
       or key points to follow in the introductory remarks to each section, or
       perhaps an appendix reference to lists classified as former, updated and
       new or advanced with appropriate set of collected references or section
       numbers. The intent would be to provide known points to localize and
       organize some of the referencing load, and help differentiate the base
       from the current extensions when much of it is still undifferentiated
       information to the illiterate newbie.
    
    The intent was for RFC-2616 (HTTP-1.1) and RFC-2518 (WebDAV) to be
    required reading, and then the sections in this document are marked
    indicating whether they contain "new stuff" or "effects of extensions
    on old stuff".  Can you give an example of some places where it was
    unclear what was base material, and what was extensions?
    
       Section 2
    
         There are two topics that are base concepts, but which I believe have
       new or stronger implications in the context of the current extensions.
       Both concepts arise in specific comments in a few sections, but seem to
       be largely ignored, or assumed. This might be a point to at least
       introduce them or provide a generic statement.
    
    We tried as much as possible to define versioning as "orthogonal"
    extensions to what is currently in HTTP and WebDAV.  So there is an
    implicit "if we don't talk about a construct, we haven't changed or
    extended its semantics".
    
         The first is caching. What are the implications and mechanisms for
       resolving caching issues in the case of the server and active client,
       but also in the case of propagating changes to other clients arising
       from extensions to enhance parallel development? A brief summary of any
       prerequisites or assumptions, plus some indication of what, if any,
       additional constraints or actions might be needed by the following
       extensions would be useful. If there is a good discussion of this
       elsewhere, then perhaps a simple context definition and reference would
       be sufficient to clarify requirements or known limitations for the
       following extensions.
    
    The only interactions that we identified are listed in the protocol
    (i.e. the Vary header and the Cache-Control header requirements).
    
         The second is locking. Some operations mention lock failure as a pre-
       or post-condition error. Most elements and operations do not identify
       themselves as being lockable or not, let alone the scope of any lock.
    
    We tried to be very careful to not modify any of the WebDAV locking
    semantics, but did identify any new methods that need to respect a
    write lock on the affected resource.  So a server could chose to implement
    versioning with or without locking.
    
       A generic description of locking semantics, any optimization
       techniques for combining lock requests with operations for single
       trip turnaround, and generic lock errors might be useful here, with
       specific additions or deviations reflected in the later sections.
    
    We prefer not to repeat semantics that we do not modify, since this
    creates unfortunate linkages to the protocol that defines those semantics.
    In particular, we'd prefer not to have to rev the versioning protocol
    just because some aspect of the locking protocol changed.
    
         A third topic that I believe is critical to many advanced operations
       is that of providing a mechanism for combining individual operations or
       property updates into a single atomic request, at least from the view of
       any client. Most real versioning, or content management requests consist
       of a number of the base operations described here and elsewhere. In many
       cases it is impossible, from a practical or timely standpoint at least,
       to deal with multiple unserialized compound actions, or failure modes of
       compound actions in a networked environment. At least some indication of
       the generic solution to this within the current proposed standard, if
       not specific support, would be very useful. I note there is support for
       multi-status responses. What about multi-part requests? Are there any
       thoughts on this? Previous discussions? Or is it a can of worms that has
       been carefully set aside :-?
    
    This was discussed in the early days of the WebDAV protocol (I remember
    a BATCH or some such method being suggested).  So you can find these
    discussions, but basically, yes it is a can of worms that has been
    carefully set aside.
    
         Note I carefully did not throw performance into the last paragraph
       under the premise that premature optimization is the root of all evil.
       But it would be a useful side effect.
    
    You should be able to get much of the optimization you need from
    the HTTP-1.1 ability to keep a connection alive.
    
       Section 3.1, "unknown" discussion
    
         I agree with many of the discussion points. I too found "unknown" to be
       more of a generic element characteristic than a request for appropriate
       server behaviour. Here are a couple of possibilities for consideration.
         Using terminology that at least partly suggests an association with a
       verb or action, rather than a pure adjective, might help. An example
       might be "not-known".
         To better indicate that this is a server behaviour or response rather
       than a type classification use a more specific term like "supported". 
       Thus "not_supported" or "nosupport" might be reasonable alternatives.
    
    I read this as suggestions for a better name for this attribute,
    as opposed to changing its semantics (is that correct?).  So far
    we've got:
    
    unknown        if-unknown
    not-known      if-not-known
    not-supported  if-not-supported
    no-support     if-no-support
    
    Anyone want to vote for their favorite (or add to the list)?
    
       =====
    
       Section 4.1
    
         There are efforts such as the Dublin Core work, to identify standard
       properties. Should WebDAV properties be selected to conform with such
       systems wherever possible to maximize recognition? 
    
    Any particular suggestion for how we might make them conform better?
    
         Which is the precise concept desired here for creator-displayname
       author or owner, (content or object) creator?
    
    It's intended to be resource (object) creator, but in the case of
    a versioning system, where each new interesting state is stored as
    a separate resource (a version), the distinction becomes fuzzy.
    
       =====
    
       Section 5.3
    
         I tend to associate "name" with a human friendly or maybe client
       label. For bureaucratic or server imposed labels, which are often
       numeric or non-phonetic I use id or identifier.
         Is "version-id" more appropriate here? Is there intent in choosing the
       label to carry any such additional (subliminal) characterisitics.
    
    The purpose of "version-name" was for it to be the "human-memorable" name
    that some systems try to provide.  So "name" was the intent (you already
    have an "id" in the URL that identifies the version.
    
       =====
    
       Section 6.1
    
         I do not understand the difference to the target between a MOVE and a
       COPY of a non-collection resource, i.e. why delete (overwrite:T) is ok
       for a MOVE, and how the described semantics might differ for a COPY from
       a delete and recreation of a new target, or why this is seemingly not
       allowed. There appear to be some subtle implied constraints on the
       implementation that are not explicitly defined anywhere?
    
    I've added some Postconditions to COPY and MOVE to clarify this.  In
    particular, a COPY of a version selector creates a new resource (with
    a new version history if it is put under version control), while a
    MOVE of a version selector just renames the existing version selector.
    
         Why is an operation on a collection not defined to be the
       corresponding operation on each member of the collection plus any
       consistency adjustments to the collection itself?
    
    That is what "Overwrite: update" effectively does, but there is another
    interesting semantic (which is defined in RFC-2518) which first removes
    the destination before making the MOVE/COPY.  In particular, a user
    might want a member to be in the copy only if it was a member in the
    original (which is not the case for Overwrite:update).
    
         A long as update does not imply content or properties merging of a
       non-collection resource, it seems that the practical effect for update
       is only on collections - is this statement true?
    
    When the history of a resource is being tracked by a versioning
    system, updating the contents of a resource is very different from
    replacing a resource.
    
       =====
    
       Section 7.1
    
         I could not find a reference to a "Vary header" to complete the
       definition.
    
    I'll change this to say "HTTP-1.1 Vary header" to clarify this.
    
       =====
    
       Section 8
    
         Is paragraph 2 true even if the resource selected is mutable?
    
    Yes.  A mutable revision can only be updated with a CHECKIN, not
    directly by a PUT or PROPPATCH.
    
         Paragraph 2 implies that one cannot have a version selector point to a
       working resource? Is this true?
    
    Yes.  A version selector doesn't actually point to anything ... but
    it does have a "DAV:target", which indicates which version has
    the same content and dead properties as the version selector.
    
    The semantics of a version selector have (hopefully :-) been clarified
    in the 8.1 draft (based on the recent email thread).
    
       What if any would be the corresponding
       redirector or link element for a (server-side) working resource?
    
    There is no redirector or link element defined in the versioning
    protocol.  Think of the version selector as a separate resource whose
    content and dead properties happen to be the same as some version in
    its history (i.e. its current DAV:target).
    
         This may be more appropriate to 10.2 which states that a CHECKOUT MAY
       replace a version selector with a working resource. But I believe it
       may affect several sections of which this is the first.
    
         I can envisage two scenarios in which a version selector points to
       some version of an object and several workspaces indirect through this
       selector to share a particular view of the world.
    
    Each workspace has its own set of version selectors (so they don't
    redirect through a common version selector).
    
       In one case, typical
       of most development, each workspace expects to see a consistent version
       of the history. Any workspace can perform checkout/edit/checkin to
       update the version selector to point to a new version, but only checked
       in versions will be globally visible.
    
    Yes.
    
       In the second scenario, the object
       may be akin to a change package in which several distinct processes are
       collaborating to update the state of the world for some activity in
       progress. When the activity is complete the final version of the object
       will be checked in, but the updates are performed in a shared context.
    
    Note that in this case, it would be advisable to use locks to keep the
    distinct processes from stepping on each other's toes.
    
         How would each of the two examples disambiguate themselves in a single
       implementation under the current extensions, or does the "MAY" in 10.2
       make this impossible? 
    
    This has been fixed in draft 8.1.  In particular, the client specifies
    whether it wants the "in-place" or "out-of-place" checkout behavior.
    
       How might they be implemented simultaneously, i.e.
       is there a mechanism for a "well-known" version selector to point to a
       current working resource and its current version at the same time with,
       for example, a client property used to select which target is currently
       desired?
    
    The only way for two version selectors to share information
    is when one of them has created a new version, and the other
    SET-TARGET's or MERGE's to see it.  
    
       If not, is there a builtin mechanism for sharing a newly
       created URL amongst collaborating processes when the working resource is
       returned at a different location?
    
    Such a collaboration is likely to be only supported in the
    context of workspace support, which gives the checked out
    resources user meaningful names (i.e. via "in-place" checkouts).
    
       How do 16.8 restrictions on version
       selectors affect the solution to having both collaborators and end-users
       sharing a workspace version through the update process?
    
    Which restrictions did you have in mind here?
    
       =====
    
       8.1
    
         This section mentions 4xx status codes and preconditions, but almost
       none are assigned in any later sections, and no explanation is given on
       how this should be carried out. If this is still under discussion, then
       a placeholder comment might be useful (even if only as a reminder of
       work to be done).
    
    This is just a reference to the 4xx status codes defined in HTTP-1.1
    (RFC-2616).  The versioning protocol does not introduce any new 4xx
    status codes.
    
    
       =====
    
       8.3
    
         I am not sure I am interpretting the pre- and post-conditions
       correctly. 
    
         The pre-condition implies that I can select a version using a
       target-selector label which is in the version history but is not the
       current version of a version selector. But the response will always be
       the target version of the version selector.
    
         This doesn't seem to be the intended, or a worthwhile result.
    
    Yes, the postcondition is wrong.  I'll fix it.
    
       =====
    
       9.1
    
         Is the post condition meant to include such things as live properties
       of the server object e.g. last-accessed time? "State of any resource"
       seems an overly broad definition to be practically useful.
    
    Good point.  I'll replace this with "content and dead properties".
    
    
       =====
    
       10.2
         What, if any, is the mechanism to provide a checkout comment?
    
       10.3
         What, if any, is the mechanism to provide a checkin comment?
    
    You could store them in the DAV:comment property of the version,
    using whatever convention you wanted to distinguish the checkout from
    the checkin comment (assuming you wanted to distinguish them).
    
         Is "checkin and label" intended to be two distinct operations? 
    
    Two distinct and very different operations.
    
       =====
    
       10.6
    
         How would an atomic "Move label" operation be implemented? Is this
       what "set" is intended for?
    
    Yes.
    
       There is no corresponding pre-condition
       "cannot move label".
    
    Pre-conditions identify something specific about the state of the
    resource that would preclude the operation.  We could have a general
    "must be able to perform operation" precondition, but I don't think
    that is of any use to a client or server.
    
       ====
    
         The last three are instances where a standard way to package sets of
       operations and property updates as an atomic request would be useful.
       Most tools present these actions to the user as a single request, and
       overlapping non-atomic updates could become very confusing.
    
    This falls into the "can of worms" category that you mentioned earlier.
    
       =====
    
       13.5
    
         This section left me very confused. Up until this point my concept 
       of a Collection was some sort of base class or container for a set of
       versioning elements.
    
    See RFC 2518 for the definition of a collection.  The semantics are
    clarified in the proposed "Bindings" protocol extension (which is still
    in the internet-draft stage).
    
    Basically, a collection is a resource that identifies a set of
    other resources ("immediate members of that collection") by a name
    that is syntactically restricted to be a URL segment.
    
       A workspace was a collection which had a particular
       mix of these elements and a baseline was a particular snapshot of the
       current state of the versioned subset of a workspace.
    
    Sounds good.  Note that a baseline is not itself a collection.
    
       If I were to put a
       workspace under version control, each saved workspace version would be a
       baseline and any working resources would be lost (or would be required
       to be checked in) on checkin/checkpoint of the workspace.
    
    We would call this putting the workspace under "baseline control".
    It certainly is reasonable to think of a baseline of a workspace
    as being a "deep version" of that workspace, but we don't use that
    terminology to make sure there is no confusion between a "version"
    of a collection (just its immediate versioned members) and a
    "deep version" of a collection (all of its versioned members).
    
       An activity is
       a collection with version depth as well as breadth ...
    
    An activity is not a collection, but saying it has "version
    depth and breadth" is a reasonable metaphor (albeit somewhat
    poetic :-).
    
         But the concept of a collection version containing only binding to
       histories, although I understand the propagation of change argument at
       its basic level, leaves me without an identifiable example of what a
       versioned collection would be or would be used for.
    
    The basic problem is what information is necessary to allow you to
    reconstruct the versioned members of a collection from a baseline of
    that collection.  One constraint is that a SET-TARGET on a version
    selector must not require creating new versions of all parents of that
    version selector (the propagation of change argument).  Another
    constraint is that you not be forced to check out every member of a
    collection in order to move it.
    
    These constraints are satisfied by having a collection version be a
    collection whose immediate members are version histories.
    
    I'll add something like this to the protocol document.
    
       At the moment, it is
       nothing more than a set of labels or pathnames with no idea of
       associated content.
    
    By associated content, do you mean which version of those version
    histories are selected?  Remember that a baseline contains a version
    of every member of a collection, so the baseline provides you with
    this information.
    
       But neither is it a framework or directory structure
       as some of the pathnames refer to unspecified non-collection elements.
    
    The member of the collection version gives a name to a versioned
    history, and the version of that versioned history is selected by
    the baseline.
    
       Is there some aspect of this that I have missed that would clarify
       understanding? What is the physical manifestation of checking out a
       collection version or is a collection really an abstract element?
    
    This might be clearer in the 8.1 draft.  In particular, checking out a
    collection version selector just changes the state of that collection
    to be "checked out" (i.e. an in-place checkout).
    
       How might a bare collection be used?  If it is just an abstract
       concept, then does it belong in the standard and should there be
       versioning operations defined on it?
    
    What do you mean by a "bare collection"?
    
         This appears to conflict with 14.6.1 and 17.3.
    
    What is the conflict you had in mind?
    
       =====
    
       14.6.1 (also 17.3)
    
         In this section, collections have baselines. But Baselines are defined
       as consisting only of versions, and collections refer to histories, not
       versions. My confusion with 13.5 is now compounded.
    
    The key distinction here is between what are the members of a
    collection version selector and what are the members of a collection
    version.
    
    The members of a collection version selector are versionable resources
    and other version selectors.  This provides you with a standard namspace
    to traverse.  The members of a collection version are version histories.
    This provides you with an efficient mechanism for capturing the state
    of a collection as a set of revisions (of both the collections and the
    non-collection members of that collection).
    
         Should collection have been limited to workspace here? Should baseline
       in 13.2 have its definition expanded?
    
    I believe not.  What would be the reason?
    
       =====
    
       17.4
    
         The last paragraph before Marshalling states that a request version is
       ignored unless a merge destination exists. Alternate semantics would
       have the merge default to a copy in the case of a null destination. What
       is the rationale for the given choice?
    
    What would be the relative name of that copy (with respect to the 
    workspace)?  If an activity was being merged, the version has no name.
    If a baseline was being merged, the version has a name but what if
    there already is a version selector by that name in the workspace,
    but that is associated with a different version history?
    What if the last thing you did was to delete that version history
    from your collection (so you wouldn't be happy if the MERGE
    just brought it right back).
    
    Basically, you need user input to decide what to do here, which
    means a client needs to iterate through the DAV:ignored-set 
    one at a time, so that the user can decide what to do.
    
         It seems at least reasonable to return a list of unmerged versions
       with a reason or is this what the DAV:ignored-set postcondition does?
    
    Yes, that's what it is for.
    
       Where is DAV:ignored-set defined (is 18.5 its definition as opposed to 
       earlier ordering of definitions in sections like xml-elements)? 
    
    The contents of the DAV:ignored-set is defined in the postconditions
    of the MERGE request.
    
       =====
    
    
    Thanks for the great review, Ross!  Please follow-up if anything
    is still unclear.  I'll try to get an 8.2 draft out soon, with 
    the changes based on your review.
    
    Cheers,
    Geoff