RE: The 7 Deadly Sins of Versioning from Jim Whitehead on 1998-06-07 (w3c-dist-auth@w3.org from April to June 1998)

From: Jim Whitehead <ejw@ics.uci.edu>
Date: Sun, 7 Jun 1998 16:17:26 -0700
To: Fabio Vitali <fabio@cs.unibo.it>, WEBDAV WG <w3c-dist-auth@w3.org>
Cc: David Durand <dgd@cs.bu.edu>
Message-ID: <000901bd926a$6f2f4740$d115c380@galileo.ics.uci.edu>
David, Fabio,

Thank you for taking the time to draft your position paper, "The 7 Deadly
Sins of Versioning."  While I agree with many of your points, there are some
on which I disagree.

> 1. Enforced linear versioning

I agree with you -- retrofitting non-linear versioning onto a linear
versioning system may not be possible, and will likely be inelegant if
possible, forcing unnecessary tradeoffs.

> 2. Serial editing

I agree that parallel editing is useful in many contexts, especially
software development.  However, parallel development support requires merge
support, and many media types are inadequately served by merge tools.  For
example, how many image merge tools are in existence which help merge two
people's changes to the same image?  So, while parallel editing support
should be provided, it must also be possible to limit editing to serial.

> 3. Integrating versioning and structure information

I'm not sure I followed this point.  The requirement in WebDAV is to design
a versioning protocol which works for all media types, not just
XML/HTML/SGML based media types.  As a result, by meeting this requirement I
believe we also avoid integrating versioning and structure information.
However, WebDAV versioning also has to work for completely unstructured
information as well -- WebDAv versioning must work for an arbitrary instance
of text/plain or application/octet-stream.

> 4. Using binary formats

This is a point on which I waver.  While I definitely favor use of a
text-based format (the initial WebDAV Distributed Authoring protocol
specification is testament to this), for uses like DRP, sending differences
over the wire in compressed XML vs. a tailored format for efficient
differences will be less efficient.  As for applications where efficiency is
the main driver, I wonder whether it will be sufficient.

But, I do agree with you that developing the diff representation in XML does
provide a good opportunity for separation of concerns.  But, it does raise
other ones, like how best to encode an arbitrary chunk of binary difference
data in XML.  Or, what happens if the difference is in a difference charset
from the original (e.g. applying a UTF-16 diff to a UTF-8 original).

> 5. Hard-wiring policy

Avoiding hard-wired policy is a bit of a motherhood statement.  One of the
difficult issues for WebDAV is the fact that many repositories support some
existing policy, and it would be easier to integrate them with WebDAV if
some hook (like a CHECKOUT/CHECKIN method pair) were provided.  I think a
flexible approach to provide these hooks, but also provide operators which
allow direct manipulation of the version history graph.  This way WebDAV
provides a built-in policy, but has the flexibility to be interoperable with
systems which don't use the built-in policy.

> 6. Confusing bytes and characters

While I agree with this point (characters and octets are different, since in
multi-octet charsets like UTF-16 and UCS-4, a single character can be
multiple octets), I disagree with your mechanism for how to handle arbitrary
binary diff information.  A combination of XML and MIME multipart requires
the use of two parsers to understand a difference stream.  This will always
be inferior to a difference which requires only one parser (e.g., a non-XML
binary diff format), and developers will strongly lobby for the one parser
approach.  Plus, the two parser approach will be slower, and harder to
optimize.

It seems to me you've provided a pretty compelling argument against the use
of XML as a difference format.

> 7. Not providing a computable evolution of addresses

I think the first paragraph shows how text-centric this observation is:

> An important application of versioning systems it to be able to determine
the
> position, content and identity of all subparts of a resource in all its
versions
> in which they exist in some form. Ideally it should be possible to
retrieve and
> identify any single character of a document in all the versions in which
it
> existed.

How would you track changes to an image, or to a digital movie?  One problem
is evolving addresses in a media-type independent way.  Another problem is
that on the Web, the smallest addressable part is a resource.  There are no
existing standards for sub-resource scope addresses which apply across all
media types.

> Conclusions: Ignoring VTML

> We are not brainwashed advocates for VTML (in any of its versions).

I may make you both T-shirts with this quote... :-)

I continue to assert that VTML is a technology which holds promise as a
difference format for potential use in WebDAV.  Before specifying its use,
I'd like to see how it can accomodate a difference between two images (say,
two JPEG files), or in the general case, how it can accomodate binary
differences.

- Jim


> -----Original Message-----
> From: w3c-dist-auth-request@w3.org
> [mailto:w3c-dist-auth-request@w3.org]On Behalf Of Fabio Vitali
> Sent: Thursday, May 28, 1998 10:16 AM
> To: WEBDAV WG
> Cc: David Durand
> Subject: The 7 Deadly Sins of Versioning
>
>
> Dear all,
>
> As the WebDAV group is now preparing to consider versioning in detail, and
> we will not be attending the meeting in June at Microsoft, David
> Durand and
> I felt that our best contribution would be to record a list of possible
> decisions that should be avoided for versioning on the WWW. We have thus
> written a document, "The 7 Deadly Sins of Versioning (plus a venial one)",
> that we would like to submit to the attention of the WEBDAV group.
>
> The document is both attached here and available at:
> http://www.cs.unibo.it/~fabio/webdav/7sins.html
>
*snip*
Received on Sunday, 7 June 1998 19:23:55 UTC