Re: The 7 Deadly Sins of Versioning

Simple summary of a basic point:
   Sure, we're text-oriented in the comments we sent -- that's our
specialization. However, no Web-based authoring protocol can afford to
ignore textual data. I'm not sure that we make very many special requests
that could _never_ be useful for non-textual data, but if we do that's not
an argument against them. It is merely an argument that DAV shouldn't
require the use of text-oriented facilities for non-textual data.


At 1:27 AM -0400 6/8/98, Andre van der Hoek wrote:
>All,
>
>here some comments from a "pure" versioning perspective....

dear to my heart, though Pure versioning != Software Engineering.

>> > 1. Enforced linear Versioning

>I agree as well, it is the mechanism that is important, not the appearance.
>But, I do recommend reading the "Linear Versioning" paper by Chris Seiwald
>in SCM-6.  It tells us about an approach in which linear versioning is
>embedded in a CM system, and variants are represented as new name spaces.
>The paper illustrates how this simplifies the user perspective quite a bit.
>It is a thought to consider this mechanism for WebDAV, the model is
>tremendously simple, and actually maps very nicely onto WebDAV.

The only thing that worries me is losing the ability to express branching
histories and compare/merge/analyze across them.

>> > 2. Serial editing
>>
>> I agree that parallel editing is useful in many contexts, especially
>> software development.  However, parallel development support requires merge
>> support, and many media types are inadequately served by merge tools.  For
>> example, how many image merge tools are in existence which help merge two
>> people's changes to the same image?  So, while parallel editing support
>> should be provided, it must also be possible to limit editing to serial.
>
>I equal parallel editing to variant handling, the same mechanism can be
>used.  The problem with variant handling is that there are many ways to
>resolve or reserve branches, and this is exactly the prblem that WebDAV
>is going to have to address.  Traditional versioning systems usually adopt
>one policy for variant handling (pre-assign variant number, reserve and
>implicitly lock, and many others), but WebDAV will not be able to do this.
>To accomodate all types of data, WebDAV has to provide a set of policies.

I think not. IOt needs to pick some mechanisms, hopefully ones that block
as few policies as possible (ideally none!).

>In particular, now suppose someone does come up with a merging tool for
>GIF or JPEG.  All of a sudden WebDAV would have to change?  No, the policies
>should be included beforehand and be well-thought out.  Thus, I agree with
>both Fabio and DAvid as well as Jim, at times one wants serial editing, but
>for serious CM type work, we do need parallel editing, and even there, wep
>probably need several mechanisms (some references: TRED, SCM-7, J.J. Hunt;
>Continuus model; 3-dimensional versioning, SCM-5, J. Estublier).

exactly. I'm not sure that universal merge mechanisms are a good idea. I
believe them to be possible, but it's clear that for each mechanism there
will be combinations of data types and typical operations for which that
mechanism will not be optimal (or even sensible).

>Btw, I don't think WebDAV should be concerned with merging, WebDAV should
>only be concerned with the data model and allowing a "merged with" link,
>or "depends on X, Y, and Z" link type.  Merging itself should be outside
>the scope, it is a method or operation on the data in the data model.
>Thus, merge tools or methods should all be external.  I would hate to see
>a "merge" method in HTTP...., it will never work for all data types.

Personally, I think merge is so important that we can't ignore it -- but I
tend to agree that we can't solve the problem ideally for all data types:
we may need an extensible mechanism, and some standard minimal methods
suitable for common text and binary types.


>> > 3. Integrating versioning and structure information
>I think this is a non-issue from the WebDAV standpoint of view, anything
>could be versioned (i.e., I agree with Jim, and I think also with Fabio
>and David).

right. Integrating versioning into data formats is not extensible or sensible.

>
>> > 4. Using binary formats
>>
>> This is a point on which I waver.  While I definitely favor use of a
>> text-based format (the initial WebDAV Distributed Authoring protocol
>> specification is testament to this), for uses like DRP, sending differences
>> over the wire in compressed XML vs. a tailored format for efficient
>> differences will be less efficient.  As for applications where efficiency is
>> the main driver, I wonder whether it will be sufficient.
>
>There is a subtle difference here between two things:
>
>   - the difference between the old version and new version
>   - compressing data
>
>Both provide a speedup that can be quite considerable, and both might be
>needed for everything to really work (consider that 1 meg image where I
>change one pixel, compressing helps, but boy, I would rather send the
>diff!).  Usually a combo would be best.  However, sending diff's does
>run into point 2: generic merge tools.  Thus, I suggest WebDAV should
>provide two means of updating a repository: sending a plain new version
>(compressed or not) and allowing a diff to be shipped as well (of course,
>assuming the serve knows the type).  Issues remain, but both will need
>to be addressed.

We were talking only within your first point. Assuming that we need a diff
format, we are arguing that we are better off with a text-based format thna
an optimal but opaque binary format (with its problems of
non-debuggability, byte-ordering, etc.).

>
>> But, I do agree with you that developing the diff representation in XML does
>> provide a good opportunity for separation of concerns.  But, it does raise
>> other ones, like how best to encode an arbitrary chunk of binary difference
>> data in XML.  Or, what happens if the difference is in a difference charset
>> from the original (e.g. applying a UTF-16 diff to a UTF-8 original).
>
>A comparative study on diff algorithms appears in SCM-6, J.J. Hunt,
>Tichy, and someone from Bell Labs whose name I forgot here for a sec
>(hey, it's late at night).  At the moment, there is a clear winner.
>It is made available by DuraSoft, and is called BDE (binary difference
>engine), a separate engine for differencing.  I can't speak for the
>authors of the tools, but I would be pretty psyched it is was my tool
>that was chosen as standard for all web diffing and they might provide
>it for free?  Just something to think about.

In an authoring context, we must also consider that diffs are not always
the output of some automatic process... They should be easily creatable by
editors so that the actual changes made can be represented, and not just
some heuristic guess at a set oif changmes equivalent to the user's actions.

>> > 5. Hard-wiring policy
>>
>> Avoiding hard-wired policy is a bit of a motherhood statement.  One of the
>> difficult issues for WebDAV is the fact that many repositories support some
>> existing policy, and it would be easier to integrate them with WebDAV if
>> some hook (like a CHECKOUT/CHECKIN method pair) were provided.  I think a
>> flexible approach to provide these hooks, but also provide operators which
>> allow direct manipulation of the version history graph.  This way WebDAV
>> provides a built-in policy, but has the flexibility to be interoperable with
>> systems which don't use the built-in policy.
>
>I think the hardwire policy not only refers to checkin/checkout, but more
>to variant handling, parallel editing, locking necessities, etc.  I agree
>that hooks should be there, but I also would like WebDAV to give me quite
>a bit of degrees of freedom in building some versioning/auditing system on
>top.  If just the hook of checkin/checkout is provided, that is pretty darn
>little to work on.  In versioning there is, unfortunately, quite a bit of
>policy choices, even at the low level.  Whatever choices WebDAV makes, I
>think very good arguments will have to be provided as to why certain
>policies are ignored or not.  Good paper to read on policies: Versioning
>Models by Westfechtel and Conradi (to appear in ACM Survey's).



>Oh, one more note.  Notice that the versioning as discussed by Fabio and
>David does hardwire one policy: it completely ignores the change-set
>approach that is now becoming increasingly popular in CM.  As opposed to
>managing versions, change-set manage the changes as the entities and
>create a particular system on demand by taking a baseline and set of
>changes that are merged in.  Just so you know, this policy is ruled out
>from the start by the approach of storing versions.  Some workarounds
>are possible (see, Weber, SCM-7, change-sets and change-packages).


My thesis research is a change-oriented approach to collaborative
authoring, and I certainly hope not to forstall development of such
systems. I see the provision of DIFF and merge mechanisms (with flexible
policies) as the easiest way to support such systems without havning to
convince the whole rest of the world to abondon their version-centric
perspective.

>> > 6. Confusing bytes and characters

>Too webbie for me, but from what I can make of it, a generic binary diff
>format could still be used right?  <XML BINARY DIFF BEGIN> and <XML BINARY
>DIFF END> with some baseline version number and off the difference engine
>goes.....?

No, that's the problem. XML can't include arbitrary binary data within a
document, since the in-document data must conform to a valid character
encoding, and certain escaping conventions: not all binary sequences are
reasonable in such data. There are also some problems with potential
transcoding that we should endeavor to avoid. In my variation of VTML, I
did not use XML syntax, but rather ASCII, with carefully simplified
escaping rules so that arbitrary binary data could be handled. Last time I
mentioned that on this list I was bruitally flamed for propagating more
formats when I could be using XML.

Jim may be right, but since so many free XML parsers are already available,
I tend to think that satisfying the XML parser component is pretty trivial.
So I stick by opour suggestion that a 2-part format can work, especially
since MIME and XML are likely to be part of Web-based authoring systems
anyway.

>> > 7. Not providing a computable evolution of addresses

>I completely disagree with Fabio and David here.  The purpose of versioning
>is to provide a generic mechanism to store and retrieve revisions and variants
>(versions) of arbitrary data.

All I canb say is that that's _your_ purpose. From an _authoring_ point of
view the purpose of the versioning system is to enable use of the textual
history of the document. The intellectual reason to do such stuff on the
web is to enable links to retain their relevance even as resources change.
And to enable authors to collobrate with others and managhe their own work.

> The structure of the data, the contents of the
>data, and all other aspects of the data should not matter.  Whatever is
>versioned is opaque to the versioning mechanism.  Thus, the versioning
>should be content-independent.  Thus, indeed as Jim says, the statement
>Fabio and David make is very very text-centric, and should in my opinion
>not be adopted for WebDAV.

WebDAV must _work_ for text. _I_ am certainly text-centric, but I don't
mind WebDAV supporting operations that may be of little use in the
authoring process. On the other hand, this point is the whole reason that I
_care about_ versioning on the WWW, so I am loathe to strike it.

I recommend Nelson's Literary Machines as a worthwhile text on why this
matters. It's not the easiest book to read and understand, and Ted is far
from the least impassioned person on _any_ topic. But he presents the case
quite thoroughly. You could also read his 1965 paper, "A data structure for
the complex, the changing, and the indeterminate" I think it's on the web
nowadays... I'll look for the URL.


> I want the V in WebDAV to provide me with a
>mechanism to store and version my byte-streams and what I do with that is
>my business, not the versioning system.  I think point 7 makes Fabio and
>David fall into their own pitfall numero 3, they let the versioning system
>know about the structure.  Thus, I think point 7 should be strucken from
>the pitfalls altogether.


Byte-stream management need not be incompatible with point 7. If you choose
not to take advantage of the ability to find corresponding bytes in
different versions, then you are not _harmed_ by that feature being
avaliable to those of use who need it.


>The way I view VTML is as a particular policy, it is more flexible than
>some other policies out there, but also does not address some other things
>other policies do.  Thus, VTML is just another system with a policy that
>should be able to use WebDAV and should be able to be built on top of
>WebDAV.

I'd be kinda happy with this. My VTML version and implementation does
support freely combinable changes and arbitrary byte sequences, but does
not have any special operations for non-sequential data types.  It also is
_not_ XML based, though I don't think it would be hard to make that happen,
using the techniques I've suggested.

>In any case, as I said, I am a versioning guy, not a Web guy, but I hope
>this helps to put some perspective on the issues that are there.....

Sure. I've been working on versioning and hypertext for far too long. In
general I've found that the Software Engineering work is rather simplistic
with respect to individual changes and critical operation such as move and
copy (that software can live without, but authors can not). It also has
tremendous facilities and complexity in the area of total configuration
consistency and policy enforcement that are inappropriate for typical
authorial processes (with their typically small, technically
unsophisticated, informal working groups).

>PS: Of course, I appreciate the work by Fabio and David, don't get me
>    wrong as this e-mail has a negative tone to it.  But, coming from
>    a versioning and CM perspective, I obviously have different opinions
>    and expectations from WebDAV and I wanted to highlight those here.

No offense taken: Civilly presented arguments pro and con are what this
process should consist of. A vigorous discussion with lots of disagreement
need not be hostile just on that account.


  -- David

_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://www.dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________

Received on Monday, 8 June 1998 16:02:01 UTC