Re: Comments on draft-ietf-deltav-versioning-08

From: Geoffrey M. Clemm (geoffrey.clemm@rational.com)
Date: Tue, Sep 19 2000
Next message: Geoffrey M. Clemm: "** time change for weekly conference call ** (Friday 11am EST)"
Previous message: Geoffrey M. Clemm: "draft-08.1 now available"
Next in thread: by way of : "Re: Comments on draft-ietf-deltav-versioning-08"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Other mail archives: [this mailing list] [other W3C mailing lists]
Mail actions: [ respond to this message ] [ mail a new topic ]
Date: Tue, 19 Sep 2000 23:41:21 -0400 (EDT)
Message-Id: <200009200341.XAA16472@tantalum.atria.com>
From: "Geoffrey M. Clemm" <geoffrey.clemm@rational.com>
To: ietf-dav-versioning@w3.org
Subject: Re: Comments on draft-ietf-deltav-versioning-08


   From: Ross Wetmore <rwetmore@verticalsky.com>

     As a preface to the following comments, it should be undertood that I 
   am still in the process of digesting the concepts as formalized by the
   relevant RFCs. While I have all the data and have been lurking on the
   discussion list for a time or have skimmed many of the past discussions
   for interpretations, I have not necessarily recognized all the structure
   or linkages and their implications. Please accept my apologies wherever
   I appear to have overlooked the obvious, and I ask for your patience if
   I am retracing too much covered ground.

We're specifically looking for "fresh eyes", so no apologies necessary!

     There are a lot of questions, as opposed to bald-faced corrections and
   it is perhaps better if one of the principals gets a first crack, or
   chance to use the editorial knife to prune any discussions within the
   full group, so I am bouncing this through a limited audience. Please feel
   free to apply whatever collective editorial discretion you wish in
   responding or forwarding parts of this onwards.

It all looks worth passing on to me.

   Previous Context (general comment)

     I very much appreciate and could not have managed reading the document
   without the meticulous back referencing of earlier definitions, material
   or foundations. The example or context following and explaining each
   assertion was also key. However, I often found myself rereading past
   sections to locate a remembered tag, reference or definition, and
   perhaps iterating several times through previous documents before I had
   the necessary context to understand a point being made.
     The following should not be construed as criticisms or requests, but
   merely possible enhancements that might have helped reduce or increase
   the efficiency of my context switching load.

     A quick summary would be useful of the background and current elements
   or key points to follow in the introductory remarks to each section, or
   perhaps an appendix reference to lists classified as former, updated and
   new or advanced with appropriate set of collected references or section
   numbers. The intent would be to provide known points to localize and
   organize some of the referencing load, and help differentiate the base
   from the current extensions when much of it is still undifferentiated
   information to the illiterate newbie.

The intent was for RFC-2616 (HTTP-1.1) and RFC-2518 (WebDAV) to be
required reading, and then the sections in this document are marked
indicating whether they contain "new stuff" or "effects of extensions
on old stuff".  Can you give an example of some places where it was
unclear what was base material, and what was extensions?

   Section 2

     There are two topics that are base concepts, but which I believe have
   new or stronger implications in the context of the current extensions.
   Both concepts arise in specific comments in a few sections, but seem to
   be largely ignored, or assumed. This might be a point to at least
   introduce them or provide a generic statement.

We tried as much as possible to define versioning as "orthogonal"
extensions to what is currently in HTTP and WebDAV.  So there is an
implicit "if we don't talk about a construct, we haven't changed or
extended its semantics".

     The first is caching. What are the implications and mechanisms for
   resolving caching issues in the case of the server and active client,
   but also in the case of propagating changes to other clients arising
   from extensions to enhance parallel development? A brief summary of any
   prerequisites or assumptions, plus some indication of what, if any,
   additional constraints or actions might be needed by the following
   extensions would be useful. If there is a good discussion of this
   elsewhere, then perhaps a simple context definition and reference would
   be sufficient to clarify requirements or known limitations for the
   following extensions.

The only interactions that we identified are listed in the protocol
(i.e. the Vary header and the Cache-Control header requirements).

     The second is locking. Some operations mention lock failure as a pre-
   or post-condition error. Most elements and operations do not identify
   themselves as being lockable or not, let alone the scope of any lock.

We tried to be very careful to not modify any of the WebDAV locking
semantics, but did identify any new methods that need to respect a
write lock on the affected resource.  So a server could chose to implement
versioning with or without locking.

   A generic description of locking semantics, any optimization
   techniques for combining lock requests with operations for single
   trip turnaround, and generic lock errors might be useful here, with
   specific additions or deviations reflected in the later sections.

We prefer not to repeat semantics that we do not modify, since this
creates unfortunate linkages to the protocol that defines those semantics.
In particular, we'd prefer not to have to rev the versioning protocol
just because some aspect of the locking protocol changed.

     A third topic that I believe is critical to many advanced operations
   is that of providing a mechanism for combining individual operations or
   property updates into a single atomic request, at least from the view of
   any client. Most real versioning, or content management requests consist
   of a number of the base operations described here and elsewhere. In many
   cases it is impossible, from a practical or timely standpoint at least,
   to deal with multiple unserialized compound actions, or failure modes of
   compound actions in a networked environment. At least some indication of
   the generic solution to this within the current proposed standard, if
   not specific support, would be very useful. I note there is support for
   multi-status responses. What about multi-part requests? Are there any
   thoughts on this? Previous discussions? Or is it a can of worms that has
   been carefully set aside :-?

This was discussed in the early days of the WebDAV protocol (I remember
a BATCH or some such method being suggested).  So you can find these
discussions, but basically, yes it is a can of worms that has been
carefully set aside.

     Note I carefully did not throw performance into the last paragraph
   under the premise that premature optimization is the root of all evil.
   But it would be a useful side effect.

You should be able to get much of the optimization you need from
the HTTP-1.1 ability to keep a connection alive.

   Section 3.1, "unknown" discussion

     I agree with many of the discussion points. I too found "unknown" to be
   more of a generic element characteristic than a request for appropriate
   server behaviour. Here are a couple of possibilities for consideration.
     Using terminology that at least partly suggests an association with a
   verb or action, rather than a pure adjective, might help. An example
   might be "not-known".
     To better indicate that this is a server behaviour or response rather
   than a type classification use a more specific term like "supported". 
   Thus "not_supported" or "nosupport" might be reasonable alternatives.

I read this as suggestions for a better name for this attribute,
as opposed to changing its semantics (is that correct?).  So far
we've got:

unknown        if-unknown
not-known      if-not-known
not-supported  if-not-supported
no-support     if-no-support

Anyone want to vote for their favorite (or add to the list)?

   =====

   Section 4.1

     There are efforts such as the Dublin Core work, to identify standard
   properties. Should WebDAV properties be selected to conform with such
   systems wherever possible to maximize recognition? 

Any particular suggestion for how we might make them conform better?

     Which is the precise concept desired here for creator-displayname
   author or owner, (content or object) creator?

It's intended to be resource (object) creator, but in the case of
a versioning system, where each new interesting state is stored as
a separate resource (a version), the distinction becomes fuzzy.

   =====

   Section 5.3

     I tend to associate "name" with a human friendly or maybe client
   label. For bureaucratic or server imposed labels, which are often
   numeric or non-phonetic I use id or identifier.
     Is "version-id" more appropriate here? Is there intent in choosing the
   label to carry any such additional (subliminal) characterisitics.

The purpose of "version-name" was for it to be the "human-memorable" name
that some systems try to provide.  So "name" was the intent (you already
have an "id" in the URL that identifies the version.

   =====

   Section 6.1

     I do not understand the difference to the target between a MOVE and a
   COPY of a non-collection resource, i.e. why delete (overwrite:T) is ok
   for a MOVE, and how the described semantics might differ for a COPY from
   a delete and recreation of a new target, or why this is seemingly not
   allowed. There appear to be some subtle implied constraints on the
   implementation that are not explicitly defined anywhere?

I've added some Postconditions to COPY and MOVE to clarify this.  In
particular, a COPY of a version selector creates a new resource (with
a new version history if it is put under version control), while a
MOVE of a version selector just renames the existing version selector.

     Why is an operation on a collection not defined to be the
   corresponding operation on each member of the collection plus any
   consistency adjustments to the collection itself?

That is what "Overwrite: update" effectively does, but there is another
interesting semantic (which is defined in RFC-2518) which first removes
the destination before making the MOVE/COPY.  In particular, a user
might want a member to be in the copy only if it was a member in the
original (which is not the case for Overwrite:update).

     A long as update does not imply content or properties merging of a
   non-collection resource, it seems that the practical effect for update
   is only on collections - is this statement true?

When the history of a resource is being tracked by a versioning
system, updating the contents of a resource is very different from
replacing a resource.

   =====

   Section 7.1

     I could not find a reference to a "Vary header" to complete the
   definition.

I'll change this to say "HTTP-1.1 Vary header" to clarify this.

   =====

   Section 8

     Is paragraph 2 true even if the resource selected is mutable?

Yes.  A mutable revision can only be updated with a CHECKIN, not
directly by a PUT or PROPPATCH.

     Paragraph 2 implies that one cannot have a version selector point to a
   working resource? Is this true?

Yes.  A version selector doesn't actually point to anything ... but
it does have a "DAV:target", which indicates which version has
the same content and dead properties as the version selector.

The semantics of a version selector have (hopefully :-) been clarified
in the 8.1 draft (based on the recent email thread).

   What if any would be the corresponding
   redirector or link element for a (server-side) working resource?

There is no redirector or link element defined in the versioning
protocol.  Think of the version selector as a separate resource whose
content and dead properties happen to be the same as some version in
its history (i.e. its current DAV:target).

     This may be more appropriate to 10.2 which states that a CHECKOUT MAY
   replace a version selector with a working resource. But I believe it
   may affect several sections of which this is the first.

     I can envisage two scenarios in which a version selector points to
   some version of an object and several workspaces indirect through this
   selector to share a particular view of the world.

Each workspace has its own set of version selectors (so they don't
redirect through a common version selector).

   In one case, typical
   of most development, each workspace expects to see a consistent version
   of the history. Any workspace can perform checkout/edit/checkin to
   update the version selector to point to a new version, but only checked
   in versions will be globally visible.

Yes.

   In the second scenario, the object
   may be akin to a change package in which several distinct processes are
   collaborating to update the state of the world for some activity in
   progress. When the activity is complete the final version of the object
   will be checked in, but the updates are performed in a shared context.

Note that in this case, it would be advisable to use locks to keep the
distinct processes from stepping on each other's toes.

     How would each of the two examples disambiguate themselves in a single
   implementation under the current extensions, or does the "MAY" in 10.2
   make this impossible? 

This has been fixed in draft 8.1.  In particular, the client specifies
whether it wants the "in-place" or "out-of-place" checkout behavior.

   How might they be implemented simultaneously, i.e.
   is there a mechanism for a "well-known" version selector to point to a
   current working resource and its current version at the same time with,
   for example, a client property used to select which target is currently
   desired?

The only way for two version selectors to share information
is when one of them has created a new version, and the other
SET-TARGET's or MERGE's to see it.  

   If not, is there a builtin mechanism for sharing a newly
   created URL amongst collaborating processes when the working resource is
   returned at a different location?

Such a collaboration is likely to be only supported in the
context of workspace support, which gives the checked out
resources user meaningful names (i.e. via "in-place" checkouts).

   How do 16.8 restrictions on version
   selectors affect the solution to having both collaborators and end-users
   sharing a workspace version through the update process?

Which restrictions did you have in mind here?

   =====

   8.1

     This section mentions 4xx status codes and preconditions, but almost
   none are assigned in any later sections, and no explanation is given on
   how this should be carried out. If this is still under discussion, then
   a placeholder comment might be useful (even if only as a reminder of
   work to be done).

This is just a reference to the 4xx status codes defined in HTTP-1.1
(RFC-2616).  The versioning protocol does not introduce any new 4xx
status codes.


   =====

   8.3

     I am not sure I am interpretting the pre- and post-conditions
   correctly. 

     The pre-condition implies that I can select a version using a
   target-selector label which is in the version history but is not the
   current version of a version selector. But the response will always be
   the target version of the version selector.

     This doesn't seem to be the intended, or a worthwhile result.

Yes, the postcondition is wrong.  I'll fix it.

   =====

   9.1

     Is the post condition meant to include such things as live properties
   of the server object e.g. last-accessed time? "State of any resource"
   seems an overly broad definition to be practically useful.

Good point.  I'll replace this with "content and dead properties".


   =====

   10.2
     What, if any, is the mechanism to provide a checkout comment?

   10.3
     What, if any, is the mechanism to provide a checkin comment?

You could store them in the DAV:comment property of the version,
using whatever convention you wanted to distinguish the checkout from
the checkin comment (assuming you wanted to distinguish them).

     Is "checkin and label" intended to be two distinct operations? 

Two distinct and very different operations.

   =====

   10.6

     How would an atomic "Move label" operation be implemented? Is this
   what "set" is intended for?

Yes.

   There is no corresponding pre-condition
   "cannot move label".

Pre-conditions identify something specific about the state of the
resource that would preclude the operation.  We could have a general
"must be able to perform operation" precondition, but I don't think
that is of any use to a client or server.

   ====

     The last three are instances where a standard way to package sets of
   operations and property updates as an atomic request would be useful.
   Most tools present these actions to the user as a single request, and
   overlapping non-atomic updates could become very confusing.

This falls into the "can of worms" category that you mentioned earlier.

   =====

   13.5

     This section left me very confused. Up until this point my concept 
   of a Collection was some sort of base class or container for a set of
   versioning elements.

See RFC 2518 for the definition of a collection.  The semantics are
clarified in the proposed "Bindings" protocol extension (which is still
in the internet-draft stage).

Basically, a collection is a resource that identifies a set of
other resources ("immediate members of that collection") by a name
that is syntactically restricted to be a URL segment.

   A workspace was a collection which had a particular
   mix of these elements and a baseline was a particular snapshot of the
   current state of the versioned subset of a workspace.

Sounds good.  Note that a baseline is not itself a collection.

   If I were to put a
   workspace under version control, each saved workspace version would be a
   baseline and any working resources would be lost (or would be required
   to be checked in) on checkin/checkpoint of the workspace.

We would call this putting the workspace under "baseline control".
It certainly is reasonable to think of a baseline of a workspace
as being a "deep version" of that workspace, but we don't use that
terminology to make sure there is no confusion between a "version"
of a collection (just its immediate versioned members) and a
"deep version" of a collection (all of its versioned members).

   An activity is
   a collection with version depth as well as breadth ...

An activity is not a collection, but saying it has "version
depth and breadth" is a reasonable metaphor (albeit somewhat
poetic :-).

     But the concept of a collection version containing only binding to
   histories, although I understand the propagation of change argument at
   its basic level, leaves me without an identifiable example of what a
   versioned collection would be or would be used for.

The basic problem is what information is necessary to allow you to
reconstruct the versioned members of a collection from a baseline of
that collection.  One constraint is that a SET-TARGET on a version
selector must not require creating new versions of all parents of that
version selector (the propagation of change argument).  Another
constraint is that you not be forced to check out every member of a
collection in order to move it.

These constraints are satisfied by having a collection version be a
collection whose immediate members are version histories.

I'll add something like this to the protocol document.

   At the moment, it is
   nothing more than a set of labels or pathnames with no idea of
   associated content.

By associated content, do you mean which version of those version
histories are selected?  Remember that a baseline contains a version
of every member of a collection, so the baseline provides you with
this information.

   But neither is it a framework or directory structure
   as some of the pathnames refer to unspecified non-collection elements.

The member of the collection version gives a name to a versioned
history, and the version of that versioned history is selected by
the baseline.

   Is there some aspect of this that I have missed that would clarify
   understanding? What is the physical manifestation of checking out a
   collection version or is a collection really an abstract element?

This might be clearer in the 8.1 draft.  In particular, checking out a
collection version selector just changes the state of that collection
to be "checked out" (i.e. an in-place checkout).

   How might a bare collection be used?  If it is just an abstract
   concept, then does it belong in the standard and should there be
   versioning operations defined on it?

What do you mean by a "bare collection"?

     This appears to conflict with 14.6.1 and 17.3.

What is the conflict you had in mind?

   =====

   14.6.1 (also 17.3)

     In this section, collections have baselines. But Baselines are defined
   as consisting only of versions, and collections refer to histories, not
   versions. My confusion with 13.5 is now compounded.

The key distinction here is between what are the members of a
collection version selector and what are the members of a collection
version.

The members of a collection version selector are versionable resources
and other version selectors.  This provides you with a standard namspace
to traverse.  The members of a collection version are version histories.
This provides you with an efficient mechanism for capturing the state
of a collection as a set of revisions (of both the collections and the
non-collection members of that collection).

     Should collection have been limited to workspace here? Should baseline
   in 13.2 have its definition expanded?

I believe not.  What would be the reason?

   =====

   17.4

     The last paragraph before Marshalling states that a request version is
   ignored unless a merge destination exists. Alternate semantics would
   have the merge default to a copy in the case of a null destination. What
   is the rationale for the given choice?

What would be the relative name of that copy (with respect to the 
workspace)?  If an activity was being merged, the version has no name.
If a baseline was being merged, the version has a name but what if
there already is a version selector by that name in the workspace,
but that is associated with a different version history?
What if the last thing you did was to delete that version history
from your collection (so you wouldn't be happy if the MERGE
just brought it right back).

Basically, you need user input to decide what to do here, which
means a client needs to iterate through the DAV:ignored-set 
one at a time, so that the user can decide what to do.

     It seems at least reasonable to return a list of unmerged versions
   with a reason or is this what the DAV:ignored-set postcondition does?

Yes, that's what it is for.

   Where is DAV:ignored-set defined (is 18.5 its definition as opposed to 
   earlier ordering of definitions in sections like xml-elements)? 

The contents of the DAV:ignored-set is defined in the postconditions
of the MERGE request.

   =====


Thanks for the great review, Ross!  Please follow-up if anything
is still unclear.  I'll try to get an 8.2 draft out soon, with 
the changes based on your review.

Cheers,
Geoff
Next message: Geoffrey M. Clemm: "** time change for weekly conference call ** (Friday 11am EST)"
Previous message: Geoffrey M. Clemm: "draft-08.1 now available"
Next in thread: by way of : "Re: Comments on draft-ietf-deltav-versioning-08"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Other mail archives: [this mailing list] [other W3C mailing lists]
Mail actions: [ respond to this message ] [ mail a new topic ]