RE: The 7 Deadly Sins of Versioning from David G. Durand on 1998-06-08 (w3c-dist-auth@w3.org from April to June 1998)

From: David G. Durand <dgd@cs.bu.edu>
Date: Mon, 8 Jun 1998 16:46:24 -0400
To: WEBDAV WG <w3c-dist-auth@w3.org>
Message-Id: <v03007802b1a0f022eaed@[205.181.197.106]>
I started this before Andre's note, but then answered Andre first. I've
tried to eliminate redundancy from this direct reply.

At 7:17 PM -0400 6/7/98, Jim Whitehead wrote:
>> 2. Serial editing
>I agree that parallel editing is useful in many contexts, especially
>software development.  However, parallel development support requires merge
>support, and many media types are inadequately served by merge tools.  For
>example, how many image merge tools are in existence which help merge two
>people's changes to the same image?  So, while parallel editing support
>should be provided, it must also be possible to limit editing to serial.

This is a great way of rephrasing what we said: Parallel editing must be
possible. Your emphasis is different, but the key idea is not to _prevent_
parallel editing. It's very easy to do that if you don't plan ahead.

Parallel editing doe _not_ logically imply merging, if one is willing to
keep divergent versions.

The best manual merge tool I know of for images is Photoshop: you can paint
directly from one image into another... That's a manual merge, of course.

I think there are two things to bear in mind:

   1. text is a critical data type.

   2. if something is needed for one data type, say text, and not needed
for another, say JPEG encoded images, then it's still needed by DAV; it
just won't be applied to JPEG encoded images!

>> 3. Integrating versioning and structure information
>
>I'm not sure I followed this point.  The requirement in WebDAV is to design
>a versioning protocol which works for all media types, not just
>XML/HTML/SGML based media types.

Right. The deadly sin is to try to combine data format and versioning in
some way: This has been proposed multiple times for HTML, and so it is
worth pointing out the pitfall. As Fabio said, a lot of this isn't rocket
science, but we felt someone ought to day it right up front. I suspect that
we may end up needing more than one, or perhaps an extendible set of Diff
formats.

>> 5. Hard-wiring policy

>Avoiding hard-wired policy is a bit of a motherhood statement.  One of the
>difficult issues for WebDAV is the fact that many repositories support some
>existing policy, and it would be easier to integrate them with WebDAV if
>some hook (like a CHECKOUT/CHECKIN method pair) were provided.  I think a
>flexible approach to provide these hooks, but also provide operators which
>allow direct manipulation of the version history graph.  This way WebDAV
>provides a built-in policy, but has the flexibility to be interoperable with
>systems which don't use the built-in policy.

This could be OK, depending on whether the needed operations are provided.
I don't think that we can afford to ingore the special requirements of
authors in a standard intended for use with the Web (based on text formats
at a fundamental level) and intended to support authorship. If those
facilities are not useful to people using HTTP as an SCM transfer protocol,
then they need not be used.

>> 6. Confusing bytes and characters

>It seems to me you've provided a pretty compelling argument against the use
>of XML as a difference format.

This may be, but in that case, I'd recommend using an ASCII-based format.
We could even still use XML, with a fixed encoding declaration, and with
binary data stored within the document. I can work out the details of this
is people are interested.

>> 7. Not providing a computable evolution of addresses
>
>I think the first paragraph shows how text-centric this observation is:

Sure. I think that text is an important format. I see nothing in this
requirement that _can't_ be supported for arbitrary binary byte streams,
nor am I aware of any digital data that is _not_ representable by such a
byte stream.

>
>How would you track changes to an image, or to a digital movie?  One problem
>is evolving addresses in a media-type independent way.  Another problem is
>that on the Web, the smallest addressable part is a resource.  There are no
>existing standards for sub-resource scope addresses which apply across all
>media types.

I can certainly imagine data formats for which such operations would be
eminently sensible. The real problem is not that such operations don't make
sense, but that typical formats use aggressive compression techniques that
may hide logical intenral boundaries like those between pixels and frames.

I don't see how allowing for this kind of operation prevents an application
from _not_ using it when it doesn't make sense.

>> Conclusions: Ignoring VTML
>
>> We are not brainwashed advocates for VTML (in any of its versions).
>
>I may make you both T-shirts with this quote... :-)
>
>I continue to assert that VTML is a technology which holds promise as a
>difference format for potential use in WebDAV.  Before specifying its use,
>I'd like to see how it can accomodate a difference between two images (say,
>two JPEG files), or in the general case, how it can accomodate binary
>differences.

My variation works with binary data (the implementation is a bit
over-engineered, because it's a prototype, but it's certainly not hard to
deal with).

I'm not convinced that byte-stream differencing is sensible for JPEG, but
I'm not sure what other approach is possible for a media-type independent
format.

  -- David

_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://www.dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________
Received on Monday, 8 June 1998 16:47:23 UTC