Next message: Geoffrey M. Clemm: "** time change for weekly conference call ** (Friday 11am EST)"
Date: Tue, 19 Sep 2000 23:41:21 -0400 (EDT)
Message-Id: <200009200341.XAA16472@tantalum.atria.com>
From: "Geoffrey M. Clemm" <geoffrey.clemm@rational.com>
To: ietf-dav-versioning@w3.org
Subject: Re: Comments on draft-ietf-deltav-versioning-08
From: Ross Wetmore <rwetmore@verticalsky.com>
As a preface to the following comments, it should be undertood that I
am still in the process of digesting the concepts as formalized by the
relevant RFCs. While I have all the data and have been lurking on the
discussion list for a time or have skimmed many of the past discussions
for interpretations, I have not necessarily recognized all the structure
or linkages and their implications. Please accept my apologies wherever
I appear to have overlooked the obvious, and I ask for your patience if
I am retracing too much covered ground.
We're specifically looking for "fresh eyes", so no apologies necessary!
There are a lot of questions, as opposed to bald-faced corrections and
it is perhaps better if one of the principals gets a first crack, or
chance to use the editorial knife to prune any discussions within the
full group, so I am bouncing this through a limited audience. Please feel
free to apply whatever collective editorial discretion you wish in
responding or forwarding parts of this onwards.
It all looks worth passing on to me.
Previous Context (general comment)
I very much appreciate and could not have managed reading the document
without the meticulous back referencing of earlier definitions, material
or foundations. The example or context following and explaining each
assertion was also key. However, I often found myself rereading past
sections to locate a remembered tag, reference or definition, and
perhaps iterating several times through previous documents before I had
the necessary context to understand a point being made.
The following should not be construed as criticisms or requests, but
merely possible enhancements that might have helped reduce or increase
the efficiency of my context switching load.
A quick summary would be useful of the background and current elements
or key points to follow in the introductory remarks to each section, or
perhaps an appendix reference to lists classified as former, updated and
new or advanced with appropriate set of collected references or section
numbers. The intent would be to provide known points to localize and
organize some of the referencing load, and help differentiate the base
from the current extensions when much of it is still undifferentiated
information to the illiterate newbie.
The intent was for RFC-2616 (HTTP-1.1) and RFC-2518 (WebDAV) to be
required reading, and then the sections in this document are marked
indicating whether they contain "new stuff" or "effects of extensions
on old stuff". Can you give an example of some places where it was
unclear what was base material, and what was extensions?
Section 2
There are two topics that are base concepts, but which I believe have
new or stronger implications in the context of the current extensions.
Both concepts arise in specific comments in a few sections, but seem to
be largely ignored, or assumed. This might be a point to at least
introduce them or provide a generic statement.
We tried as much as possible to define versioning as "orthogonal"
extensions to what is currently in HTTP and WebDAV. So there is an
implicit "if we don't talk about a construct, we haven't changed or
extended its semantics".
The first is caching. What are the implications and mechanisms for
resolving caching issues in the case of the server and active client,
but also in the case of propagating changes to other clients arising
from extensions to enhance parallel development? A brief summary of any
prerequisites or assumptions, plus some indication of what, if any,
additional constraints or actions might be needed by the following
extensions would be useful. If there is a good discussion of this
elsewhere, then perhaps a simple context definition and reference would
be sufficient to clarify requirements or known limitations for the
following extensions.
The only interactions that we identified are listed in the protocol
(i.e. the Vary header and the Cache-Control header requirements).
The second is locking. Some operations mention lock failure as a pre-
or post-condition error. Most elements and operations do not identify
themselves as being lockable or not, let alone the scope of any lock.
We tried to be very careful to not modify any of the WebDAV locking
semantics, but did identify any new methods that need to respect a
write lock on the affected resource. So a server could chose to implement
versioning with or without locking.
A generic description of locking semantics, any optimization
techniques for combining lock requests with operations for single
trip turnaround, and generic lock errors might be useful here, with
specific additions or deviations reflected in the later sections.
We prefer not to repeat semantics that we do not modify, since this
creates unfortunate linkages to the protocol that defines those semantics.
In particular, we'd prefer not to have to rev the versioning protocol
just because some aspect of the locking protocol changed.
A third topic that I believe is critical to many advanced operations
is that of providing a mechanism for combining individual operations or
property updates into a single atomic request, at least from the view of
any client. Most real versioning, or content management requests consist
of a number of the base operations described here and elsewhere. In many
cases it is impossible, from a practical or timely standpoint at least,
to deal with multiple unserialized compound actions, or failure modes of
compound actions in a networked environment. At least some indication of
the generic solution to this within the current proposed standard, if
not specific support, would be very useful. I note there is support for
multi-status responses. What about multi-part requests? Are there any
thoughts on this? Previous discussions? Or is it a can of worms that has
been carefully set aside :-?
This was discussed in the early days of the WebDAV protocol (I remember
a BATCH or some such method being suggested). So you can find these
discussions, but basically, yes it is a can of worms that has been
carefully set aside.
Note I carefully did not throw performance into the last paragraph
under the premise that premature optimization is the root of all evil.
But it would be a useful side effect.
You should be able to get much of the optimization you need from
the HTTP-1.1 ability to keep a connection alive.
Section 3.1, "unknown" discussion
I agree with many of the discussion points. I too found "unknown" to be
more of a generic element characteristic than a request for appropriate
server behaviour. Here are a couple of possibilities for consideration.
Using terminology that at least partly suggests an association with a
verb or action, rather than a pure adjective, might help. An example
might be "not-known".
To better indicate that this is a server behaviour or response rather
than a type classification use a more specific term like "supported".
Thus "not_supported" or "nosupport" might be reasonable alternatives.
I read this as suggestions for a better name for this attribute,
as opposed to changing its semantics (is that correct?). So far
we've got:
unknown if-unknown
not-known if-not-known
not-supported if-not-supported
no-support if-no-support
Anyone want to vote for their favorite (or add to the list)?
=====
Section 4.1
There are efforts such as the Dublin Core work, to identify standard
properties. Should WebDAV properties be selected to conform with such
systems wherever possible to maximize recognition?
Any particular suggestion for how we might make them conform better?
Which is the precise concept desired here for creator-displayname
author or owner, (content or object) creator?
It's intended to be resource (object) creator, but in the case of
a versioning system, where each new interesting state is stored as
a separate resource (a version), the distinction becomes fuzzy.
=====
Section 5.3
I tend to associate "name" with a human friendly or maybe client
label. For bureaucratic or server imposed labels, which are often
numeric or non-phonetic I use id or identifier.
Is "version-id" more appropriate here? Is there intent in choosing the
label to carry any such additional (subliminal) characterisitics.
The purpose of "version-name" was for it to be the "human-memorable" name
that some systems try to provide. So "name" was the intent (you already
have an "id" in the URL that identifies the version.
=====
Section 6.1
I do not understand the difference to the target between a MOVE and a
COPY of a non-collection resource, i.e. why delete (overwrite:T) is ok
for a MOVE, and how the described semantics might differ for a COPY from
a delete and recreation of a new target, or why this is seemingly not
allowed. There appear to be some subtle implied constraints on the
implementation that are not explicitly defined anywhere?
I've added some Postconditions to COPY and MOVE to clarify this. In
particular, a COPY of a version selector creates a new resource (with
a new version history if it is put under version control), while a
MOVE of a version selector just renames the existing version selector.
Why is an operation on a collection not defined to be the
corresponding operation on each member of the collection plus any
consistency adjustments to the collection itself?
That is what "Overwrite: update" effectively does, but there is another
interesting semantic (which is defined in RFC-2518) which first removes
the destination before making the MOVE/COPY. In particular, a user
might want a member to be in the copy only if it was a member in the
original (which is not the case for Overwrite:update).
A long as update does not imply content or properties merging of a
non-collection resource, it seems that the practical effect for update
is only on collections - is this statement true?
When the history of a resource is being tracked by a versioning
system, updating the contents of a resource is very different from
replacing a resource.
=====
Section 7.1
I could not find a reference to a "Vary header" to complete the
definition.
I'll change this to say "HTTP-1.1 Vary header" to clarify this.
=====
Section 8
Is paragraph 2 true even if the resource selected is mutable?
Yes. A mutable revision can only be updated with a CHECKIN, not
directly by a PUT or PROPPATCH.
Paragraph 2 implies that one cannot have a version selector point to a
working resource? Is this true?
Yes. A version selector doesn't actually point to anything ... but
it does have a "DAV:target", which indicates which version has
the same content and dead properties as the version selector.
The semantics of a version selector have (hopefully :-) been clarified
in the 8.1 draft (based on the recent email thread).
What if any would be the corresponding
redirector or link element for a (server-side) working resource?
There is no redirector or link element defined in the versioning
protocol. Think of the version selector as a separate resource whose
content and dead properties happen to be the same as some version in
its history (i.e. its current DAV:target).
This may be more appropriate to 10.2 which states that a CHECKOUT MAY
replace a version selector with a working resource. But I believe it
may affect several sections of which this is the first.
I can envisage two scenarios in which a version selector points to
some version of an object and several workspaces indirect through this
selector to share a particular view of the world.
Each workspace has its own set of version selectors (so they don't
redirect through a common version selector).
In one case, typical
of most development, each workspace expects to see a consistent version
of the history. Any workspace can perform checkout/edit/checkin to
update the version selector to point to a new version, but only checked
in versions will be globally visible.
Yes.
In the second scenario, the object
may be akin to a change package in which several distinct processes are
collaborating to update the state of the world for some activity in
progress. When the activity is complete the final version of the object
will be checked in, but the updates are performed in a shared context.
Note that in this case, it would be advisable to use locks to keep the
distinct processes from stepping on each other's toes.
How would each of the two examples disambiguate themselves in a single
implementation under the current extensions, or does the "MAY" in 10.2
make this impossible?
This has been fixed in draft 8.1. In particular, the client specifies
whether it wants the "in-place" or "out-of-place" checkout behavior.
How might they be implemented simultaneously, i.e.
is there a mechanism for a "well-known" version selector to point to a
current working resource and its current version at the same time with,
for example, a client property used to select which target is currently
desired?
The only way for two version selectors to share information
is when one of them has created a new version, and the other
SET-TARGET's or MERGE's to see it.
If not, is there a builtin mechanism for sharing a newly
created URL amongst collaborating processes when the working resource is
returned at a different location?
Such a collaboration is likely to be only supported in the
context of workspace support, which gives the checked out
resources user meaningful names (i.e. via "in-place" checkouts).
How do 16.8 restrictions on version
selectors affect the solution to having both collaborators and end-users
sharing a workspace version through the update process?
Which restrictions did you have in mind here?
=====
8.1
This section mentions 4xx status codes and preconditions, but almost
none are assigned in any later sections, and no explanation is given on
how this should be carried out. If this is still under discussion, then
a placeholder comment might be useful (even if only as a reminder of
work to be done).
This is just a reference to the 4xx status codes defined in HTTP-1.1
(RFC-2616). The versioning protocol does not introduce any new 4xx
status codes.
=====
8.3
I am not sure I am interpretting the pre- and post-conditions
correctly.
The pre-condition implies that I can select a version using a
target-selector label which is in the version history but is not the
current version of a version selector. But the response will always be
the target version of the version selector.
This doesn't seem to be the intended, or a worthwhile result.
Yes, the postcondition is wrong. I'll fix it.
=====
9.1
Is the post condition meant to include such things as live properties
of the server object e.g. last-accessed time? "State of any resource"
seems an overly broad definition to be practically useful.
Good point. I'll replace this with "content and dead properties".
=====
10.2
What, if any, is the mechanism to provide a checkout comment?
10.3
What, if any, is the mechanism to provide a checkin comment?
You could store them in the DAV:comment property of the version,
using whatever convention you wanted to distinguish the checkout from
the checkin comment (assuming you wanted to distinguish them).
Is "checkin and label" intended to be two distinct operations?
Two distinct and very different operations.
=====
10.6
How would an atomic "Move label" operation be implemented? Is this
what "set" is intended for?
Yes.
There is no corresponding pre-condition
"cannot move label".
Pre-conditions identify something specific about the state of the
resource that would preclude the operation. We could have a general
"must be able to perform operation" precondition, but I don't think
that is of any use to a client or server.
====
The last three are instances where a standard way to package sets of
operations and property updates as an atomic request would be useful.
Most tools present these actions to the user as a single request, and
overlapping non-atomic updates could become very confusing.
This falls into the "can of worms" category that you mentioned earlier.
=====
13.5
This section left me very confused. Up until this point my concept
of a Collection was some sort of base class or container for a set of
versioning elements.
See RFC 2518 for the definition of a collection. The semantics are
clarified in the proposed "Bindings" protocol extension (which is still
in the internet-draft stage).
Basically, a collection is a resource that identifies a set of
other resources ("immediate members of that collection") by a name
that is syntactically restricted to be a URL segment.
A workspace was a collection which had a particular
mix of these elements and a baseline was a particular snapshot of the
current state of the versioned subset of a workspace.
Sounds good. Note that a baseline is not itself a collection.
If I were to put a
workspace under version control, each saved workspace version would be a
baseline and any working resources would be lost (or would be required
to be checked in) on checkin/checkpoint of the workspace.
We would call this putting the workspace under "baseline control".
It certainly is reasonable to think of a baseline of a workspace
as being a "deep version" of that workspace, but we don't use that
terminology to make sure there is no confusion between a "version"
of a collection (just its immediate versioned members) and a
"deep version" of a collection (all of its versioned members).
An activity is
a collection with version depth as well as breadth ...
An activity is not a collection, but saying it has "version
depth and breadth" is a reasonable metaphor (albeit somewhat
poetic :-).
But the concept of a collection version containing only binding to
histories, although I understand the propagation of change argument at
its basic level, leaves me without an identifiable example of what a
versioned collection would be or would be used for.
The basic problem is what information is necessary to allow you to
reconstruct the versioned members of a collection from a baseline of
that collection. One constraint is that a SET-TARGET on a version
selector must not require creating new versions of all parents of that
version selector (the propagation of change argument). Another
constraint is that you not be forced to check out every member of a
collection in order to move it.
These constraints are satisfied by having a collection version be a
collection whose immediate members are version histories.
I'll add something like this to the protocol document.
At the moment, it is
nothing more than a set of labels or pathnames with no idea of
associated content.
By associated content, do you mean which version of those version
histories are selected? Remember that a baseline contains a version
of every member of a collection, so the baseline provides you with
this information.
But neither is it a framework or directory structure
as some of the pathnames refer to unspecified non-collection elements.
The member of the collection version gives a name to a versioned
history, and the version of that versioned history is selected by
the baseline.
Is there some aspect of this that I have missed that would clarify
understanding? What is the physical manifestation of checking out a
collection version or is a collection really an abstract element?
This might be clearer in the 8.1 draft. In particular, checking out a
collection version selector just changes the state of that collection
to be "checked out" (i.e. an in-place checkout).
How might a bare collection be used? If it is just an abstract
concept, then does it belong in the standard and should there be
versioning operations defined on it?
What do you mean by a "bare collection"?
This appears to conflict with 14.6.1 and 17.3.
What is the conflict you had in mind?
=====
14.6.1 (also 17.3)
In this section, collections have baselines. But Baselines are defined
as consisting only of versions, and collections refer to histories, not
versions. My confusion with 13.5 is now compounded.
The key distinction here is between what are the members of a
collection version selector and what are the members of a collection
version.
The members of a collection version selector are versionable resources
and other version selectors. This provides you with a standard namspace
to traverse. The members of a collection version are version histories.
This provides you with an efficient mechanism for capturing the state
of a collection as a set of revisions (of both the collections and the
non-collection members of that collection).
Should collection have been limited to workspace here? Should baseline
in 13.2 have its definition expanded?
I believe not. What would be the reason?
=====
17.4
The last paragraph before Marshalling states that a request version is
ignored unless a merge destination exists. Alternate semantics would
have the merge default to a copy in the case of a null destination. What
is the rationale for the given choice?
What would be the relative name of that copy (with respect to the
workspace)? If an activity was being merged, the version has no name.
If a baseline was being merged, the version has a name but what if
there already is a version selector by that name in the workspace,
but that is associated with a different version history?
What if the last thing you did was to delete that version history
from your collection (so you wouldn't be happy if the MERGE
just brought it right back).
Basically, you need user input to decide what to do here, which
means a client needs to iterate through the DAV:ignored-set
one at a time, so that the user can decide what to do.
It seems at least reasonable to return a list of unmerged versions
with a reason or is this what the DAV:ignored-set postcondition does?
Yes, that's what it is for.
Where is DAV:ignored-set defined (is 18.5 its definition as opposed to
earlier ordering of definitions in sections like xml-elements)?
The contents of the DAV:ignored-set is defined in the postconditions
of the MERGE request.
=====
Thanks for the great review, Ross! Please follow-up if anything
is still unclear. I'll try to get an 8.2 draft out soon, with
the changes based on your review.
Cheers,
Geoff