Re: The 7 Deadly Sins of Versioning from Andre van der Hoek on 1998-06-08 (w3c-dist-auth@w3.org from April to June 1998)

From: Andre van der Hoek <andre@bigtime.cs.colorado.edu>
Date: Sun, 07 Jun 1998 23:27:14 -0600
To: ejw@ics.uci.edu
cc: Fabio Vitali <fabio@cs.unibo.it>, WEBDAV WG <w3c-dist-auth@w3.org>, David Durand <dgd@cs.bu.edu>
Message-Id: <199806080527.XAA00399@bigtime.cs.colorado.edu>
All, 

here some comments from a "pure" versioning perspective....

> > 1. Enforced linear versioning
> 
> I agree with you -- retrofitting non-linear versioning onto a linear
> versioning system may not be possible, and will likely be inelegant if
> possible, forcing unnecessary tradeoffs.
 
I agree as well, it is the mechanism that is important, not the appearance.
But, I do recommend reading the "Linear Versioning" paper by Chris Seiwald
in SCM-6.  It tells us about an approach in which linear versioning is 
embedded in a CM system, and variants are represented as new name spaces.
The paper illustrates how this simplifies the user perspective quite a bit.
It is a thought to consider this mechanism for WebDAV, the model is 
tremendously simple, and actually maps very nicely onto WebDAV.

> > 2. Serial editing
> 
> I agree that parallel editing is useful in many contexts, especially
> software development.  However, parallel development support requires merge
> support, and many media types are inadequately served by merge tools.  For
> example, how many image merge tools are in existence which help merge two
> people's changes to the same image?  So, while parallel editing support
> should be provided, it must also be possible to limit editing to serial.
 
I equal parallel editing to variant handling, the same mechanism can be
used.  The problem with variant handling is that there are many ways to
resolve or reserve branches, and this is exactly the prblem that WebDAV 
is going to have to address.  Traditional versioning systems usually adopt
one policy for variant handling (pre-assign variant number, reserve and
implicitly lock, and many others), but WebDAV will not be able to do this.
To accomodate all types of data, WebDAV has to provide a set of policies.
In particular, now suppose someone does come up with a merging tool for
GIF or JPEG.  All of a sudden WebDAV would have to change?  No, the policies
should be included beforehand and be well-thought out.  Thus, I agree with
both Fabio and DAvid as well as Jim, at times one wants serial editing, but
for serious CM type work, we do need parallel editing, and even there, wep
probably need several mechanisms (some references: TRED, SCM-7, J.J. Hunt;
Continuus model; 3-dimensional versioning, SCM-5, J. Estublier).

Btw, I don't think WebDAV should be concerned with merging, WebDAV should
only be concerned with the data model and allowing a "merged with" link,
or "depends on X, Y, and Z" link type.  Merging itself should be outside
the scope, it is a method or operation on the data in the data model. 
Thus, merge tools or methods should all be external.  I would hate to see
a "merge" method in HTTP...., it will never work for all data types.

> > 3. Integrating versioning and structure information
> 
> I'm not sure I followed this point.  The requirement in WebDAV is to design
> a versioning protocol which works for all media types, not just
> XML/HTML/SGML based media types.  As a result, by meeting this requirement I
> believe we also avoid integrating versioning and structure information.
> However, WebDAV versioning also has to work for completely unstructured
> information as well -- WebDAv versioning must work for an arbitrary instance
> of text/plain or application/octet-stream.
 
I think this is a non-issue from the WebDAV standpoint of view, anything
could be versioned (i.e., I agree with Jim, and I think also with Fabio 
and David).

> > 4. Using binary formats
> 
> This is a point on which I waver.  While I definitely favor use of a
> text-based format (the initial WebDAV Distributed Authoring protocol
> specification is testament to this), for uses like DRP, sending differences
> over the wire in compressed XML vs. a tailored format for efficient
> differences will be less efficient.  As for applications where efficiency is
> the main driver, I wonder whether it will be sufficient.
 
There is a subtle difference here between two things:

   - the difference between the old version and new version
   - compressing data

Both provide a speedup that can be quite considerable, and both might be
needed for everything to really work (consider that 1 meg image where I
change one pixel, compressing helps, but boy, I would rather send the
diff!).  Usually a combo would be best.  However, sending diff's does 
run into point 2: generic merge tools.  Thus, I suggest WebDAV should
provide two means of updating a repository: sending a plain new version
(compressed or not) and allowing a diff to be shipped as well (of course,
assuming the serve knows the type).  Issues remain, but both will need
to be addressed.

> But, I do agree with you that developing the diff representation in XML does
> provide a good opportunity for separation of concerns.  But, it does raise
> other ones, like how best to encode an arbitrary chunk of binary difference
> data in XML.  Or, what happens if the difference is in a difference charset
> from the original (e.g. applying a UTF-16 diff to a UTF-8 original).
 
A comparative study on diff algorithms appears in SCM-6, J.J. Hunt, 
Tichy, and someone from Bell Labs whose name I forgot here for a sec
(hey, it's late at night).  At the moment, there is a clear winner.
It is made available by DuraSoft, and is called BDE (binary difference
engine), a separate engine for differencing.  I can't speak for the
authors of the tools, but I would be pretty psyched it is was my tool
that was chosen as standard for all web diffing and they might provide
it for free?  Just something to think about.

> > 5. Hard-wiring policy
> 
> Avoiding hard-wired policy is a bit of a motherhood statement.  One of the
> difficult issues for WebDAV is the fact that many repositories support some
> existing policy, and it would be easier to integrate them with WebDAV if
> some hook (like a CHECKOUT/CHECKIN method pair) were provided.  I think a
> flexible approach to provide these hooks, but also provide operators which
> allow direct manipulation of the version history graph.  This way WebDAV
> provides a built-in policy, but has the flexibility to be interoperable with
> systems which don't use the built-in policy.
 
I think the hardwire policy not only refers to checkin/checkout, but more
to variant handling, parallel editing, locking necessities, etc.  I agree
that hooks should be there, but I also would like WebDAV to give me quite
a bit of degrees of freedom in building some versioning/auditing system on 
top.  If just the hook of checkin/checkout is provided, that is pretty darn
little to work on.  In versioning there is, unfortunately, quite a bit of
policy choices, even at the low level.  Whatever choices WebDAV makes, I
think very good arguments will have to be provided as to why certain 
policies are ignored or not.  Good paper to read on policies: Versioning
Models by Westfechtel and Conradi (to appear in ACM Survey's).

Oh, one more note.  Notice that the versioning as discussed by Fabio and
David does hardwire one policy: it completely ignores the change-set 
approach that is now becoming increasingly popular in CM.  As opposed to
managing versions, change-set manage the changes as the entities and 
create a particular system on demand by taking a baseline and set of 
changes that are merged in.  Just so you know, this policy is ruled out
from the start by the approach of storing versions.  Some workarounds
are possible (see, Weber, SCM-7, change-sets and change-packages).

> > 6. Confusing bytes and characters
> 
> While I agree with this point (characters and octets are different, since in
> multi-octet charsets like UTF-16 and UCS-4, a single character can be
> multiple octets), I disagree with your mechanism for how to handle arbitrary
> binary diff information.  A combination of XML and MIME multipart requires
> the use of two parsers to understand a difference stream.  This will always
> be inferior to a difference which requires only one parser (e.g., a non-XML
> binary diff format), and developers will strongly lobby for the one parser
> approach.  Plus, the two parser approach will be slower, and harder to
> optimize.
> 
> It seems to me you've provided a pretty compelling argument against the use
> of XML as a difference format.
 
Too webbie for me, but from what I can make of it, a generic binary diff
format could still be used right?  <XML BINARY DIFF BEGIN> and <XML BINARY
DIFF END> with some baseline version number and off the difference engine
goes.....?

> > 7. Not providing a computable evolution of addresses
> 
> I think the first paragraph shows how text-centric this observation is:
> 
> > An important application of versioning systems it to be able to determine
> the
> > position, content and identity of all subparts of a resource in all its
> versions
> > in which they exist in some form. Ideally it should be possible to
> retrieve and
> > identify any single character of a document in all the versions in which
> it
> > existed.
> 
> How would you track changes to an image, or to a digital movie?  One problem
> is evolving addresses in a media-type independent way.  Another problem is
> that on the Web, the smallest addressable part is a resource.  There are no
> existing standards for sub-resource scope addresses which apply across all
> media types.
 
I completely disagree with Fabio and David here.  The purpose of versioning
is to provide a generic mechanism to store and retrieve revisions and variants
(versions) of arbitrary data.  The structure of the data, the contents of the
data, and all other aspects of the data should not matter.  Whatever is 
versioned is opaque to the versioning mechanism.  Thus, the versioning
should be content-independent.  Thus, indeed as Jim says, the statement 
Fabio and David make is very very text-centric, and should in my opinion
not be adopted for WebDAV.  I want the V in WebDAV to provide me with a
mechanism to store and version my byte-streams and what I do with that is
my business, not the versioning system.  I think point 7 makes Fabio and
David fall into their own pitfall numero 3, they let the versioning system
know about the structure.  Thus, I think point 7 should be strucken from
the pitfalls altogether.

> > Conclusions: Ignoring VTML
> 
> > We are not brainwashed advocates for VTML (in any of its versions).
> 
> I may make you both T-shirts with this quote... :-)
 
Yes you should ;-)

> I continue to assert that VTML is a technology which holds promise as a
> difference format for potential use in WebDAV.  Before specifying its use,
> I'd like to see how it can accomodate a difference between two images (say,
> two JPEG files), or in the general case, how it can accomodate binary
> differences.

The way I view VTML is as a particular policy, it is more flexible than
some other policies out there, but also does not address some other things
other policies do.  Thus, VTML is just another system with a policy that
should be able to use WebDAV and should be able to be built on top of
WebDAV.


In any case, as I said, I am a versioning guy, not a Web guy, but I hope
this helps to put some perspective on the issues that are there.....

=== Andre ===

PS: Of course, I appreciate the work by Fabio and David, don't get me 
    wrong as this e-mail has a negative tone to it.  But, coming from
    a versioning and CM perspective, I obviously have different opinions
    and expectations from WebDAV and I wanted to highlight those here.
Received on Monday, 8 June 1998 01:30:24 UTC